Skip to main content
The savings dashboard answers the boss’s question: after adopting TELOS, how many tokens were saved, and how much money? It reads the jsonl usage_log — one line per call — and aggregates it in real time.
telos dashboard --usage-log ~/.telos/usage.jsonl --out savings.html
# or open the proxy-embedded /__telos/dashboard
TELOS savings dashboard

The four token buckets

Anthropic’s usage field is normalized into four numbers:
BucketSourceMeaning
raw_inputusage.input_tokensprompt tokens that missed the cache and did not write it
cache_readusage.cache_read_input_tokensprompt tokens that hit the cache — the key to savings
cache_writeusage.cache_creation_input_tokensprompt tokens newly written to the cache this call
outputusage.output_tokenstokens generated by the model
Anthropic’s input_tokens is only raw_input — it excludes cache_read and cache_write. So total prompt tokens = raw_input + cache_read + cache_write, and every hit% denominator on the dashboard uses this sum. Using a smaller denominator overestimates the hit rate.

Price table (USD per 1M tokens, 2026 public prices)

Model prefixinputcache_readcache_write 5mcache_write 1houtput
claude-opus-4-7 / 4-65.000.506.2510.0025.00
claude-opus-4-5 / 415.001.5018.7530.0075.00
claude-sonnet-4-6 / 4-5 / 43.000.303.756.0015.00
claude-haiku-4-5 / 41.000.101.252.005.00
gpt-5 / gpt-5.15.001.250015.00
deepseek-chat / v30.270.07001.10
_default (unrecognized)3.000.303.756.0015.00
Anthropic’s cache billing rules, already encoded:
cache_read price = 0.10 × input price        cache hit → 90% off
cache_write 5m   = 1.25 × input price         short-TTL write → pay 25% more
cache_write 1h   = 2.00 × input price         long-TTL write → pay 100% more

How savings are computed

Actual cost of a call:
cost = raw_input × input + cache_read × cache_read
     + cache_write_5m × write_5m + cache_write_1h × write_1h + output × output
Counterfactual (“if TELOS were off” — all cache_control removed, every prompt token at base input price):
counterfactual = (raw_input + cache_read + cache_write) × input + output × output
Saved:
saved = counterfactual − actual
      = cache_read     × (input − cache_read)        ← earned back by cache hits
      + cache_write_5m × (input − write_5m)          ← 5m write premium, negative
      + cache_write_1h × (input − write_1h)          ← 1h write premium, more negative
For Anthropic, the cache_write term is a negative contribution — a write costs 25–100% more than base price. Only when cache_read volume is large enough (the cache is reused many times) does the total go positive. The dashboard’s saved $ already counts this premium against you.

What the dashboard shows

  • Hero — tokens saved (total.cache_read) and cost saved (total.saved_usd), with subtitles for hit-% of total prompt tokens and % off counterfactual cost.
  • KPI bar — total calls, unique sessions, cumulative raw_input / cache_read / cache_write (5m·1h split) / output.
  • Token mix — a stacked bar over the four buckets (🟠 raw_input · 🟢 cache_read · 🟡 cache_write · 🔵 output).
  • Activity over time — hourly buckets; green bars = cache_read volume, purple line = saved_usd.
  • Breakdowns — by harness / model / session, sorted by cache_read descending.
The live savings dashboard intentionally stays focused on these totals; mode, compare_group, and replay metadata from replay/showcase workflows are kept off this page (use Replay & Comparison and /__telos/developer.json).

Common questions

Yes. A token that hits the cache pays only 10% of the price; every extra 1M tokens of cache_read saves $4.50 on Opus 4.7.
Not necessarily. Every 1M tokens written to the 5m cache costs $6.25 (25% over input’s $5), and to the 1h cache $10 (100% over). Only when that cache is subsequently hit many times can the write premium be amortized. The saved $ figure already counts the premium as a negative contribution.
Because Anthropic’s input_tokens refers only to the part that missed and didn’t write the cache, the true total prompt volume = raw_input + cache_read + cache_write. A smaller denominator overestimates the hit rate.
It falls to _default (estimated at the Sonnet tier). In that case saved $ is an approximate estimate, not a precise bill.

Developer Page

The live in-memory view: IR structure, band distribution, BP slots, and tool stats.