Sparse Autoencoders for Scientifically Rigorous Interpretation of Vision Models
Samuel Stevens, Wei-Lun (Harry) Chao, Tanya Berger-Wolf, Yu Su
The Ohio State University
Code Demos API Docs Paper [coming soon]
saev
is a package for training sparse autoencoders (SAEs) on vision transformers (ViTs) in PyTorch.
It also includes some interactive demos for scientifically rigorous interpretation of ViTs.
API reference docs are available below, as well as the source code on GitHub.
Demos
There are two web-based demos: one for interpreting bird classifications, and one for interpreting semantic segmentation.
API Docs
We provide auto-generated API docs based on docstrings. There is also a guide to getting started. You can also copy-paste all of the package from llms.txt into a long-context model and ask it questions.
- saev: A package to train SAEs for vision transformers in PyTorch.
- contrib: Sub-packages for doing interesting things with the trained SAEs.
References & Citations
Please cite our preprint or our code, depending on which is most valuable to your work.
Code:
@software{stevens2025saev, title = {{saev}}, author = {Stevens, Samuel and Wei-Lun Chao and Tanya Berger-Wolf and Yu Su}, license = {MIT}, url = {https://github.com/osu-nlp-group/saev} }
Preprint: Coming soon.
Acknowledgements
We would like to thank our colleagues from the Imageomics Institute, the ABC Center, and the OSU NLP group for their insightful and valuable feedback.
This work was supported by the Imageomics Institute, which is funded by the US National Science Foundation's Harnessing the Data Revolution (HDR) program under Award #2118240 (Imageomics: A New Frontier of Biological Information Powered by Knowledge-Guided Machine Learning). Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.