Rocinante is 120 pages of hard-won patterns for making AI agents actually work: giving them memory that lasts, routing tasks to the right model, and cutting the waste that's burning your budget.
Every morning, your agent wakes up with amnesia. The context window fills with yesterday's conversation, you burn through tokens re-explaining your codebase, and by the time it's useful, you've spent $8 on what should have been a $0.50 task.
The default setup routes everything through one model. Searching files? Your most expensive model. Reading code? Same expensive model. Generating boilerplate? Still the expensive one. You're paying premium rates for work that a model five times cheaper handles identically.
Meanwhile, you're watching your weekly usage tick up and you're not even sure where it's going. Tokens burned on re-reading files, re-explaining context, and repeating searches your agent already did yesterday. Rocinante helps you stop wasting tokens on busywork so you can use them for the things that actually matter.
The patterns to fix this exist. They're scattered across Discord threads, GitHub discussions, and the setups of a few dozen power users who figured it out through months of trial and error. Rocinante collects those patterns into a single, opinionated playbook.
Your agent forgets everything between sessions. This chapter fixes that. Build a memory system that survives restarts, grows with your projects, and costs nothing to search.
Stop sending everything to the cloud. Run search, prompts, and knowledge lookups on your own machine. Your code stays private, your costs drop, and your agent gets faster.
Use the smart model for thinking. Use the cheap model for scanning. Use the fast model for grunt work. Match the right tool to each task instead of paying top dollar for everything.
Your agent wastes tokens repeating itself, re-reading files it already knows, and asking the cloud for answers it has locally. These patterns cut the waste and improve what you get back.
Services crash. Sessions drop. Configs break. This chapter covers the unglamorous work that keeps your agent running 24/7: process management, health checks, and automatic recovery.
When you correct your agent, it should learn permanently. Not just for this session, but forever. Build agents that capture mistakes, promote lessons, and get better on their own.
This playbook wasn't assembled from Reddit threads and Discord screenshots. It started there, sure. But the patterns inside come from hundreds of hours of daily use: building, breaking, rebuilding, and evolving these systems on real infrastructure with real deadlines.
I'm a network and infrastructure security engineer. I run this stack every day on a single workstation with a local GPU. I've iterated through four major versions of the memory system, rewritten the local-first enforcement rules from scratch multiple times, and tracked the token cost of every mistake along the way.
The configurations in this playbook aren't hypothetical. They're extracted from a production setup managing 23 projects, 36 knowledge cards, semantic search over 28 repositories, and multi-model orchestration across three providers. Every gotcha, every workaround, every "don't do this" comes from doing it wrong first.
Engineering background matters here. Memory architecture is systems design. Token routing is network optimization. Model selection is capacity planning. If you've managed infrastructure, these patterns will feel familiar. You're just applying them to a different kind of system.
Stop paying for amnesia. Build systems that compound.
Get Rocinante — $49