Skip to content

Storage & Run Manifest Spec (v1)

There are two main locations:

  1. $SAEV_SCRATCH/saev/shards: where we store transformer activations (referred to as shards_root in the codebase).
  2. $SAEV_NFS/saev/runs: where we store checkpoints and other computed intermediate stuff like example images, probe1d results, etc. (referred to as runs_root in the codebase).

Visually, these are:

$SAEV_SCRATCH/saev/
  shards/
    <shard_hash>/
      metadata.json
      shards.json
      acts000000.bin
      acts000001.bin
      ...
      labels.bin

and

$SAEV_NFS/saev/
  runs/
    <run_id>/
      checkpoint/           # output of train.py on <shard_hash>
        sae.pt
        config.json
      links/                # Symlinks
        train-shards        # $SCRATCH/saev/shards/<shard_hash>
        train-dataset       # Whatever the original image dataset was
        val-shards          # $SCRATCH/saev/shards/<shard_hash>
        val-dataset         # Whatever the original image dataset was
      inference/            # outputs from dump.py
        <shard_hash>/
          config.json
          patch_acts.npz
          visuals/          # output of visuals.py

Each $SAEV_SCRATCH/shards/<shard_hash>/ MUST include:

  • metadata.json (UTF-8, canonical spec; see protocol.md)
  • shards.json (UTF-8, shard index and sizes; see protocol.md)
  • acts*.bin (binary shards; format in protocol.md)
  • labels.bin (binary patch labels aligned to shards; format in protocol.md)

Note

Immutability: Files under saev/shards/<shard_hash>/ MUST be treated as read-only after publication. Any change yields a new shard_hash.

All CLI entrypoints should accept a single --run <path> argument. Every other path MUST be resolved from the run root:

  • ViT activations: links/shardssaev/shards/<shard_hash>
  • Dataset: links/dataset → Dataset root, wherever it is on disk.
  • SAE checkpoint: checkpoint/sae.pt

Example resolution:

run = pathlib.Path(cfg.run)
shards_root = (run / "links" / "shards").resolve()
dataset_root = (run / "links" / "dataset").resolve()
ckpt = run / "checkpoint" / "sae.pt"
labels = vit_root / "labels.bin"
  • $SAEV_SCRATCH and $SAEV_NFS should be set for all users/processes running saev tools.

FAQs

  • Where do patch labels live? Next to acts*.bin in $SAEV_SCRATCH/shards/<shard_hash>/labels.bin. Scripts discover them via links/shards/labels.bin.

  • Can I put datasets directly in $SAEV_SCRATCH? Sure, but not in $SAEV_SCRATCH/shards.