Storage & Run Manifest Spec (v1)¶
There are two main locations:
$SAEV_SCRATCH/saev/shards: where we store transformer activations (referred to asshards_rootin the codebase).$SAEV_NFS/saev/runs: where we store checkpoints and other computed intermediate stuff like example images, probe1d results, etc. (referred to asruns_rootin the codebase).
Visually, these are:
$SAEV_SCRATCH/saev/
  shards/
    <shard_hash>/
      metadata.json
      shards.json
      acts000000.bin
      acts000001.bin
      ...
      labels.bin
and
$SAEV_NFS/saev/
  runs/
    <run_id>/
      checkpoint/           # output of train.py on <shard_hash>
        sae.pt
        config.json
      links/                # Symlinks
        train-shards        # $SCRATCH/saev/shards/<shard_hash>
        train-dataset       # Whatever the original image dataset was
        val-shards          # $SCRATCH/saev/shards/<shard_hash>
        val-dataset         # Whatever the original image dataset was
      inference/            # outputs from dump.py
        <shard_hash>/
          config.json
          token_acts.npz
          visuals/          # output of visuals.py
Each $SAEV_SCRATCH/shards/<shard_hash>/ MUST include:
metadata.json(UTF-8, canonical spec; seeprotocol.md)shards.json(UTF-8, shard index and sizes; seeprotocol.md)acts*.bin(binary shards; format inprotocol.md)labels.bin(binary patch labels aligned to shards; format inprotocol.md)
Note
Immutability: Files under saev/shards/<shard_hash>/ MUST be treated as read-only after publication. Any change yields a new shard_hash.
All CLI entrypoints should accept a single --run <path> argument.
Every other path MUST be resolved from the run root:
- ViT activations: 
links/shards→saev/shards/<shard_hash> - Dataset: 
links/dataset→ Dataset root, wherever it is on disk. - SAE checkpoint: 
checkpoint/sae.pt 
Example resolution:
run = pathlib.Path(cfg.run)
shards_root = (run / "links" / "shards").resolve()
dataset_root = (run / "links" / "dataset").resolve()
ckpt = run / "checkpoint" / "sae.pt"
labels = vit_root / "labels.bin"
$SAEV_SCRATCHand$SAEV_NFSshould be set for all users/processes running saev tools.
FAQs¶
- 
Where do patch labels live? Next to
acts*.binin$SAEV_SCRATCH/shards/<shard_hash>/labels.bin. Scripts discover them vialinks/shards/labels.bin. - 
Can I put datasets directly in
$SAEV_SCRATCH? Sure, but not in$SAEV_SCRATCH/shards.