saev¶

GitHub Repo stars

saev is a framework for training and evaluating Sparse autoencoders (SAEs) for vision transformers (ViTs), implemented in PyTorch.

Installation¶

Installation is supported with uv. saev will likely work with pure pip, conda, etc. but I will not formally support it.

Clone this repository, then from the root directory:

uv run scripts/launch.py --help

This will create a virtual environment and display the help for all the provided framework scripts.

Quick Start¶

Save some activations to disk:

uv run scripts/launch.py shards \
  --shards-root /$SCRATCH/saev/shards \
  --family clip \
  --ckpt ViT-B-32/openai \
  --d-model 768 \
  --layers 11 \
  --patches-per-ex 49 \
  --batch-size 256 \
  data:cifar10

Read the guide for details.

Why saev?¶

There are plenty of alternative libraries for SAEs:

Overcomplete, primarily developed by Thomas Fel.

However, saev has some benefits:

saev is more of a framework, rather than a library. The reason for this is that SAEs require lots of activations to train a relatively small neural network; while you can implement it with a simple inference loop, efficient training requires some caching on disk. This means using saev is a little more like Keras or PyTorch Lightning than Huggingface's Transformers or Datasets libraries.
saev offers lots of tools for interacting with sparse autoencoders after training, including interactive notebooks and evaluations.
saev includes complete code from preprints in the contrib/ directory, along with logbooks describing how the authors used and developed saev.