TELOS - TELOS

One canonical IR — tools, system, turns, and memory — runs unchanged across Anthropic · OpenAI · DeepSeek · vLLM · SGLang. Most agent frameworks treat the KV cache as a runtime gift the inference engine may or may not hand you. TELOS inverts that: cache reuse becomes a structural property of the prompt itself. Stabilize the bytes that are genuinely stable, and the cache cannot be invalidated.

Real 6-turn session −92.3%. Cost reported in absolute $/query-resolved — ratios can be gamed; dollars can’t. From the LEAP Lab @ Tsinghua University.

Quickstart

Install, connect, and watch the savings in three commands.

The Protocol

Three-color bands and monotonic append — why the cache cannot break.

Support Matrix

Harnesses, frontier models, and self-hosted inference frameworks.

Why TELOS

TELOS solves exactly two things.

Push token efficiency to the limit

A 6-turn real session drops −92.3%; a controlled 48-call run drops −36.6% (net −$2.16). Every cent is accounted for in absolute $/query-resolved.

Return context sovereignty to you

TelosIR is an engine-agnostic, serializable, portable context representation — your persona, your tools, your 20-turn thread, all on the same stone tablet. Hand it to Claude today, move it to DeepSeek tomorrow, run it on local vLLM tonight.

Key capabilities

Three-color bands

Every block declares its cache lifetime at birth: PIN / FOLD / DROP.

Monotonic append

The prompt is an append-only stream — submitted bytes are never mutated.

Multi-engine adapters

Anthropic, OpenAI, DeepSeek, vLLM and SGLang from one IR.

RTK output filtering

An orthogonal layer that shrinks repetitive tool_result tails.

Savings dashboard

Offline HTML, savings pinned to absolute dollars.

Zero-intrusion proxy

Point ANTHROPIC_BASE_URL at the local gateway — no code changes.

2 a.m. — where did all the money go?

The counter climbs to 2,847,103 tokens, and the line above reads cache_read: 0. All night, every turn fed the same 4,000-token system prompt from scratch, billed at full price. Take the exact same 6-turn conversation, drop it into TELOS, and flip two switches:

Mode	raw input tokens	cache_read	Cost for 6 turns
passthrough (today’s default)	24,151	0	$0.3623
with TELOS	0	18,701	$0.0281 (−92.3%)

Scale to 1,000 sessions: $362 → $26.

Today's agent token efficiency is only 25%

Three steps to save 90%

Install

# Linux / macOS / WSL2 / Android (Termux)
curl -fsSL https://raw.githubusercontent.com/learningCatHD/telos-sdk/main/scripts/install.sh | bash
# or: pip install -U telos-sdk

Connect

telos init

Auto-detects claude-code / codex / openclaw / hermes, injects config into each, and starts the local gateway in the background. No changes to your agent code.

Observe

telos dashboard

Opens an offline HTML dashboard showing savings per call in absolute dollars.

How it fits together

Learn more

Architecture

The three-layer harness → bridge → engine design.

SWE-bench A/B

Pre-registered study: half the input bill, no resolved-rate regression.

CLI Reference

Every telos subcommand and flag.

Quickstart

The Protocol

Support Matrix

​Why TELOS

Push token efficiency to the limit

Return context sovereignty to you

​Key capabilities

Three-color bands

Monotonic append

Multi-engine adapters

RTK output filtering

Savings dashboard

Zero-intrusion proxy

​2 a.m. — where did all the money go?

​Three steps to save 90%

​How it fits together

​Learn more

Architecture

SWE-bench A/B

CLI Reference

Why TELOS

Key capabilities

2 a.m. — where did all the money go?

Three steps to save 90%

How it fits together

Learn more