The corpus
By default the proxy records the raw request of every call to~/.telos/corpus/<session>.jsonl — requests only, not responses (Anthropic is stateless; the Nth-turn
request already contains everything from the previous N−1 turns).
--no-recordturns recording off;--corpus-dirchanges the directory.- Functions:
record_call/load_session/list_sessions.
Controlled replay
replay_session(turns, mode, ...) replays a real session under a given mode: a byte-identical turn
sequence → RTK filtering (if mode.rtk) → the TELOS pipeline (if mode.telos) → sent upstream with
max_tokens=1 → only the usage is taken.
Why
max_tokens=1 — only prefill / cache billing is measured; output generation is deliberately
neutered, so the comparison isolates the prompt-side cost.[telos-replay ns=<session>/<mode>] is injected at the
very front of the system segment for each mode, so the Anthropic-side caches stay independent —
preventing an earlier-replayed mode from warming the cache for a later one to free-ride on. The result
is appended to usage_log with compare_group = <original session id> and replay: true.
Replay vs dual session
| Cost | Controlled variables | Suitable claim | |
|---|---|---|---|
| replay | 1 real session + cheap prefill | good (turns pinned) | “for a given workload, the token bill drops by X” |
| dual session | N×K full sessions | poor (trajectory forks) | “using TELOS, the agent is cheaper overall” |
SWE-bench Verified A/B
The pre-registered dual-arm study on real GitHub issues.