Skip to main content
TELOS — Portable Agent Context
One canonical IR — tools, system, turns, and memory — runs unchanged across Anthropic · OpenAI · DeepSeek · vLLM · SGLang. Most agent frameworks treat the KV cache as a runtime gift the inference engine may or may not hand you. TELOS inverts that: cache reuse becomes a structural property of the prompt itself. Stabilize the bytes that are genuinely stable, and the cache cannot be invalidated.
Real 6-turn session −92.3%. Cost reported in absolute $/query-resolved — ratios can be gamed; dollars can’t. From the LEAP Lab @ Tsinghua University.

Quickstart

Install, connect, and watch the savings in three commands.

The Protocol

Three-color bands and monotonic append — why the cache cannot break.

Support Matrix

Harnesses, frontier models, and self-hosted inference frameworks.

Why TELOS

TELOS solves exactly two things.

Push token efficiency to the limit

A 6-turn real session drops −92.3%; a controlled 48-call run drops −36.6% (net −$2.16). Every cent is accounted for in absolute $/query-resolved.

Return context sovereignty to you

TelosIR is an engine-agnostic, serializable, portable context representation — your persona, your tools, your 20-turn thread, all on the same stone tablet. Hand it to Claude today, move it to DeepSeek tomorrow, run it on local vLLM tonight.

Key capabilities

Three-color bands

Every block declares its cache lifetime at birth: PIN / FOLD / DROP.

Monotonic append

The prompt is an append-only stream — submitted bytes are never mutated.

Multi-engine adapters

Anthropic, OpenAI, DeepSeek, vLLM and SGLang from one IR.

RTK output filtering

An orthogonal layer that shrinks repetitive tool_result tails.

Savings dashboard

Offline HTML, savings pinned to absolute dollars.

Zero-intrusion proxy

Point ANTHROPIC_BASE_URL at the local gateway — no code changes.

2 a.m. — where did all the money go?

The counter climbs to 2,847,103 tokens, and the line above reads cache_read: 0. All night, every turn fed the same 4,000-token system prompt from scratch, billed at full price. Take the exact same 6-turn conversation, drop it into TELOS, and flip two switches:
Moderaw input tokenscache_readCost for 6 turns
passthrough (today’s default)24,1510$0.3623
with TELOS018,701$0.0281 (−92.3%)
Scale to 1,000 sessions: $362 → $26.
Today's agent token efficiency is only 25%

Three steps to save 90%

1

Install

# Linux / macOS / WSL2 / Android (Termux)
curl -fsSL https://raw.githubusercontent.com/learningCatHD/telos-sdk/main/scripts/install.sh | bash
# or: pip install -U telos-sdk
2

Connect

telos init
Auto-detects claude-code / codex / openclaw / hermes, injects config into each, and starts the local gateway in the background. No changes to your agent code.
3

Observe

telos dashboard
Opens an offline HTML dashboard showing savings per call in absolute dollars.

How it fits together

Learn more

Architecture

The three-layer harness → bridge → engine design.

SWE-bench A/B

Pre-registered study: half the input bill, no resolved-rate regression.

CLI Reference

Every telos subcommand and flag.