Where the state lives
REFRESH_THRESHOLD = 11: R8 adaptive gating — if the number of real requests within the renewal
window is below the threshold, refresh is skipped, letting the cache expire naturally and avoiding a
renewal cost greater than its benefit for low-activity sessions.
Held automatically on Path A (SDK)
ATelosAnthropicTransport / TelosOpenAITransport instance equals one session. __init__ creates a
BridgeSessionState internally; each _do_create passes it to Bridge, and on response
bridge.absorb_usage(...) accumulates the cache_creation.
Held automatically on Path B (proxy), keyed by session-id
Inside the proxy,_SessionRegistry (an OrderedDict LRU, default 10000) holds the state keyed by
session_id. The session_id derivation priority:
- the
x-telos-sessionHTTP header (explicit override), metadata.user_id(an Anthropic SDK built-in field),blake2b(api_key + system + tools + messages[0])→telos-<16 hex>.
messages[]) → the same
session_id; a different initial prompt → a different one; two users with different API keys → different
ones. Once the LRU cap is exceeded the oldest session is evicted, with an INFO log.
Observing accumulation
usage_log adds a cumulative block per line:
cache_creation increasing monotonically shows accumulation is working. The refpool_slugs array
should not grow repeatedly across turns — the same document should not be registered again and
again.Disabling accumulation
Not passingsession_state, or restarting the proxy, makes the behavior fall back to a fresh Bridge
per turn. This was the default before 1.0, and it does not break wire bytes — you simply lose the
cross-turn ref-pool and counters.
Observability
Watch the accumulated cache reads turn into dollars.