cache_control injection — the only differences are the
process boundary, error handling, and streaming.
Path B · HTTP reverse proxy
Recommended. Zero code changes — set
ANTHROPIC_BASE_URL to the local gateway. This is what
telos init configures.Path A · SDK transport
In-process. Swap the LLM client for the TELOS transport; the duck interface is identical.
How they compare
| Path B · Proxy | Path A · SDK transport | |
|---|---|---|
| Integration | ANTHROPIC_BASE_URL env var, zero code change | swap the client object |
| Process model | out-of-process (separate gateway) | in-process |
| Streaming | SSE-aware reverse proxy | direct SDK streaming |
| Session state | keyed by session_id in an LRU registry | one BridgeSessionState per transport instance |
| Best for | any harness that respects ANTHROPIC_BASE_URL / OPENAI_BASE_URL | apps you control the source of |
| Failure mode | non-strict by default: degrades to passthrough | raises in-process |
Both paths share the same pure function —
proxy/pipeline.py:process_anthropic_request(raw, ...)
splits out parse → bridge → emit, eliminating any wire drift between the two.A request, end to end (Path B, mode=both)
Pick one
- You run a CLI agent (Claude Code, Codex, OpenClaw, Hermes). Use Path B — run
telos initand you’re done. See Harness integration. - You’re building your own agent in Python. Either works; Path A keeps everything in-process. See SDK transport.