Storage & Run Manifest Spec (v1)¶
There are two main locations:
$SAEV_SCRATCH/saev/shards: where we store transformer activations (referred to asshards_rootin the codebase).$SAEV_NFS/saev/runs: where we store checkpoints and other computed intermediate stuff like example images, probe1d results, etc. (referred to asruns_rootin the codebase).
Visually, these are:
$SAEV_SCRATCH/saev/
shards/
<shard_hash>/
metadata.json
shards.json
acts000000.bin
acts000001.bin
...
labels.bin
and
$SAEV_NFS/saev/
runs/
<run_id>/
checkpoint/ # output of train.py on <shard_hash>
sae.pt
config.json
links/ # Symlinks
train-shards # $SCRATCH/saev/shards/<shard_hash>
train-dataset # Whatever the original image dataset was
val-shards # $SCRATCH/saev/shards/<shard_hash>
val-dataset # Whatever the original image dataset was
inference/ # outputs from dump.py
<shard_hash>/
config.json
token_acts.npz
visuals/ # output of visuals.py
Each $SAEV_SCRATCH/shards/<shard_hash>/ MUST include:
metadata.json(UTF-8, canonical spec; seeprotocol.md)shards.json(UTF-8, shard index and sizes; seeprotocol.md)acts*.bin(binary shards; format inprotocol.md)labels.bin(binary patch labels aligned to shards; format inprotocol.md)
Note
Immutability: Files under saev/shards/<shard_hash>/ MUST be treated as read-only after publication. Any change yields a new shard_hash.
All CLI entrypoints should accept a single --run <path> argument.
Every other path MUST be resolved from the run root:
- ViT activations:
links/shards→saev/shards/<shard_hash> - Dataset:
links/dataset→ Dataset root, wherever it is on disk. - SAE checkpoint:
checkpoint/sae.pt
Example resolution:
run = pathlib.Path(cfg.run)
shards_root = (run / "links" / "shards").resolve()
dataset_root = (run / "links" / "dataset").resolve()
ckpt = run / "checkpoint" / "sae.pt"
labels = vit_root / "labels.bin"
$SAEV_SCRATCHand$SAEV_NFSshould be set for all users/processes running saev tools.
FAQs¶
-
Where do patch labels live? Next to
acts*.binin$SAEV_SCRATCH/shards/<shard_hash>/labels.bin. Scripts discover them vialinks/shards/labels.bin. -
Can I put datasets directly in
$SAEV_SCRATCH? Sure, but not in$SAEV_SCRATCH/shards.