Skip to main content
How do you prove TELOS saves money without paying for two full agent runs? You record one real session, then replay the byte-identical turn sequence under different modes, measuring only the billing.

The corpus

By default the proxy records the raw request of every call to ~/.telos/corpus/<session>.jsonl — requests only, not responses (Anthropic is stateless; the Nth-turn request already contains everything from the previous N−1 turns).
telos replay --list                # list recorded sessions
  • --no-record turns recording off; --corpus-dir changes the directory.
  • Functions: record_call / load_session / list_sessions.

Controlled replay

telos replay --session <id> --modes none telos rtk both
replay_session(turns, mode, ...) replays a real session under a given mode: a byte-identical turn sequence → RTK filtering (if mode.rtk) → the TELOS pipeline (if mode.telos) → sent upstream with max_tokens=1 → only the usage is taken.
Why max_tokens=1 — only prefill / cache billing is measured; output generation is deliberately neutered, so the comparison isolates the prompt-side cost.
Cache isolation. By default a unique prefix [telos-replay ns=<session>/<mode>] is injected at the very front of the system segment for each mode, so the Anthropic-side caches stay independent — preventing an earlier-replayed mode from warming the cache for a later one to free-ride on. The result is appended to usage_log with compare_group = <original session id> and replay: true.

Replay vs dual session

CostControlled variablesSuitable claim
replay1 real session + cheap prefillgood (turns pinned)“for a given workload, the token bill drops by X”
dual sessionN×K full sessionspoor (trajectory forks)“using TELOS, the agent is cheaper overall”
Replay is the tool for an honest, reproducible savings number on a fixed workload. The SWE-bench A/B is the dual-session approach, used to show that correctness does not regress.

SWE-bench Verified A/B

The pre-registered dual-arm study on real GitHub issues.