ROCINANTE — Stop Burning Tokens

You're paying for context,
not capability.

Every morning, your agent wakes up with amnesia. The context window fills with yesterday's conversation, you burn through tokens re-explaining your codebase, and by the time it's useful, you've spent $8 on what should have been a $0.50 task.

The default setup routes everything through one model. Searching files? Your most expensive model. Reading code? Same expensive model. Generating boilerplate? Still the expensive one. You're paying premium rates for work that a model five times cheaper handles identically.

Meanwhile, you're watching your weekly usage tick up and you're not even sure where it's going. Tokens burned on re-reading files, re-explaining context, and repeating searches your agent already did yesterday. Rocinante helps you stop wasting tokens on busywork so you can use them for the things that actually matter.

The patterns to fix this exist. They're scattered across Discord threads, GitHub discussions, and the setups of a few dozen power users who figured it out through months of trial and error. Rocinante collects those patterns into a single, opinionated playbook.

What's Inside

Six chapters. Zero filler.

Memory Architecture

Your agent forgets everything between sessions. This chapter fixes that. Build a memory system that survives restarts, grows with your projects, and costs nothing to search.
Local-First Routing

Stop sending everything to the cloud. Run search, prompts, and knowledge lookups on your own machine. Your code stays private, your costs drop, and your agent gets faster.
Multi-Model Orchestration

Use the smart model for thinking. Use the cheap model for scanning. Use the fast model for grunt work. Match the right tool to each task instead of paying top dollar for everything.
Token Optimization

Your agent wastes tokens repeating itself, re-reading files it already knows, and asking the cloud for answers it has locally. These patterns cut the waste and improve what you get back.
Production Operations

Services crash. Sessions drop. Configs break. This chapter covers the unglamorous work that keeps your agent running 24/7: process management, health checks, and automatic recovery.
Self-Improving Agents

When you correct your agent, it should learn permanently. Not just for this session, but forever. Build agents that capture mistakes, promote lessons, and get better on their own.

This is my daily system.

This playbook wasn't assembled from Reddit threads and Discord screenshots. It started there, sure. But the patterns inside come from hundreds of hours of daily use: building, breaking, rebuilding, and evolving these systems on real infrastructure with real deadlines.

I'm a network and infrastructure security engineer. I run this stack every day on a single workstation with a local GPU. I've iterated through four major versions of the memory system, rewritten the local-first enforcement rules from scratch multiple times, and tracked the token cost of every mistake along the way.

The configurations in this playbook aren't hypothetical. They're extracted from a production setup managing 23 projects, 36 knowledge cards, semantic search over 28 repositories, and multi-model orchestration across three providers. Every gotcha, every workaround, every "don't do this" comes from doing it wrong first.

Engineering background matters here. Memory architecture is systems design. Token routing is network optimization. Model selection is capacity planning. If you've managed infrastructure, these patterns will feel familiar. You're just applying them to a different kind of system.

Questions.

Is this only for OpenClaw?

The patterns are demonstrated in OpenClaw but the architecture — memory, routing, orchestration — applies to any agent framework. Claude Code, Cursor, custom builds.

I spend $200/mo already. Will this help?

That's exactly who this is for. Even at $200/mo, Opus has hourly and weekly token limits. Running it unoptimized for every task will burn through them fast. If your agent still forgets between sessions and routes everything through the most expensive model, most of that budget is going to waste. Rocinante shows you how to route tasks intelligently so Opus handles what matters and cheaper models handle the rest.

Do I need a GPU?

Recommended but not required. 8GB VRAM runs embeddings and local search. CPU works too — just slower. Both paths are covered.

Do I need to be technical?

You should be comfortable with a terminal and editing config files. You don't need to be a developer. The playbook walks through every step with copy-paste commands and explains what each one does.

Refund policy?

If the playbook doesn't save more than its cost in the first month, full refund. No questions.

Your agent
is expensive
and forgetful.

You're paying for context,
not capability.

Six chapters. Zero filler.

Memory Architecture

Local-First Routing

Multi-Model Orchestration

Token Optimization

Production Operations

Self-Improving Agents

This is my daily system.

Four ways in.

Questions.

Your agent should
remember yesterday.

Your agentis expensiveand forgetful.

You're paying for context,not capability.

Six chapters. Zero filler.

Memory Architecture

Local-First Routing

Multi-Model Orchestration

Token Optimization

Production Operations

Self-Improving Agents

This is my daily system.

Four ways in.

Questions.

Your agent shouldremember yesterday.

Your agent
is expensive
and forgetful.

You're paying for context,
not capability.

Your agent should
remember yesterday.