The core idea
Most agent frameworks treat the KV cache as a runtime gift the inference engine may or may not give you. TELOS inverts this:Cache reuse is a structural property of the prompt itself, not a matter of runtime luck. If you
never touch bytes already submitted, the cache cannot be invalidated.
The “stone tablet” metaphor
TELOS = Stable prefix · Tiered bands · Ephemeral tail · Layered adapters · Anchored marks. The inscription on the base of a stone tablet (the durable prefix) is carved once and used for a lifetime; the inscriptions added on top over time (new content each round) can be erased and rewritten at any time, but never touch the base. The entire value of the KV cache is to keep the base preserved.Three interlocking ideas
1. Three-color bands
Every content block declares its cache lifetime at birth — not post-hoc heuristics, not LLM guessing, but first-class structural annotation.| Band | Color | Semantics | Cache behavior |
|---|---|---|---|
| PIN | 🟢 | Tool defs · system prompt · current question | Permanent. Never evicted. The immutable base of every request’s prefix hash. |
| FOLD | 🟡 | Conversation history · tool results · large docs | Cacheable, compactable. Under pressure, replaced by a summary — PIN prefix bytes stay untouched. |
| DROP | 🔴 | Timestamps · CWD · git status · PIDs | Ephemeral. Excluded entirely from the prefix hash. Never contaminates upstream bytes. |
2. Monotonic append
The prompt is an append-only stream. New turns only add blocks to the tail — no mutation of already-submitted bytes. A “modification” is expressed as a new block (a summary, a redaction), never an in-place rewrite.3. Prefix-hash exclusion of DROP
When the prefix hash is computed, all DROP blocks are structurally excluded. A timestamp or git status changing every turn can never push the PIN bytes to a different byte offset, because DROP content sits after every cache breakpoint and is never part of the hashed prefix.The three invariants, formally
Let a prompt be a finite sequence of blocks , each assigned a type with the partial order PIN FOLD DROP.I1 · Ordering
For all , . Blocks are physically arranged pin* → fold* → drop*.
I2 · Monotonic append
Advancing from turn to , the bytes submitted through turn are a prefix of the
bytes submitted at turn .
I3 · Prefix-hash exclusion
All DROP blocks are excluded from the prefix-hash computation.
Versus existing practice
| Practice | I1 | I2 | I3 |
|---|---|---|---|
| Naive chat completion, system rewritten each turn | ✗ | ✗ | ✗ |
LangChain ConversationBuffer | ✗ | ✓ | ✗ |
Anthropic cache_control breakpoints | partial | ✓ | ✗ |
OpenAI prompt_cache_key | partial | ✓ | ✗ |
| TELOS | ✓ | ✓ | ✓ |
Next: Three-color bands
The data structure behind PIN / FOLD / DROP and the §5 ordering check.