How I Cut My AI Agent Costs by 60% Without Making It Dumber

I run an AI agent 24/7 on a Mac Mini. It manages my Instagram, monitors Google Ads, books tennis courts, drafts newsletters, and a bunch of other stuff I used to do manually.

Cool, right? Until I looked at the bill.

$320 in 8 days. About $40/day. Just on API credits.

That's $1,200/month for an assistant. At that point you might as well hire a human.

So I spent the last few weeks figuring out every possible way to cut costs without turning my agent into a potato. Here's what actually worked, ranked from biggest savings to smallest. Every method has exact commands and prompts you can copy-paste.

I'm basing these savings estimates on a medium-to-heavy OpenClaw user spending around $30-40/day before optimizing. Your numbers will vary but the ratios should hold.

1. Model Tiering: Use Cheap Models for Cheap Tasks

Estimated savings: $400-500/month

This is the single biggest win. If you're running Opus (or any top-tier model) for every single task, you're lighting money on fire. Most of what your agent does throughout the day is not hard. Cron jobs, status checks, automated posts, simple lookups. That's like hiring a PhD to sort your mail.

The fix: keep the expensive model for direct conversations where you need real reasoning. Drop everything else to a cheaper model.

How to set it up:

Your main model stays as-is (whatever you onboarded with). For cron jobs and sub-agents, you override the model per-job.

Tell your agent:

Switch all my cron jobs to use Sonnet instead of Opus.
Keep Opus for my main session only.

Or if you want to do it yourself, when creating or editing cron jobs, add the model parameter. Example cron job config:

{
  "sessionTarget": "isolated",
  "payload": {
    "kind": "agentTurn",
    "message": "Check the weather and send me a summary",
    "model": "anthropic/claude-sonnet-4-20250514"
  }
}

For sub-agents, tell your agent:

When you spawn sub-agents for simple tasks (file operations,
data fetching, formatting), use Sonnet. Only use Opus for
complex analysis or strategy work.

Why it works: Sonnet is roughly 40-50% cheaper than Opus per token, and for structured/repetitive tasks it performs basically the same. You won't notice a difference on 90% of automated work.

2. Connect Your ChatGPT Subscription as a Secondary Provider

Estimated savings: $100-200/month

If you're already paying $20/month for ChatGPT Plus (or $200/month for Pro), you can connect that subscription to OpenClaw. Your agent gets access to OpenAI models through your flat-rate plan instead of paying per-token.

I wouldn't use it as your main brain (rate limits will throttle you during heavy conversations), but for background monitoring, simple cron jobs, and low-stakes tasks? Basically free.

Step-by-step setup:

Step 1. Install the Codex CLI globally:

npm install -g @openai/codex

Step 2. Run the OpenClaw onboard wizard with the OpenAI Codex auth choice:

openclaw onboard --auth-choice openai-codex

Step 3. It'll show a security warning. Select Yes to continue.

Step 4. Choose QuickStart for onboarding mode.

Step 5. When it asks about config handling, choose Use existing values so it doesn't overwrite your current setup.

Step 6. It'll give you an OAuth URL. Open it in your browser, log in with your ChatGPT account, authorize it, then paste the redirect URL back into the terminal.

Step 7. IMPORTANT: The wizard will set OpenAI as your default model. Switch it back immediately:

openclaw config set agents.defaults.model.primary anthropic/claude-opus-4-6

(Replace with whatever your main model was before.)

Step 8. Restart the gateway:

openclaw gateway restart

Now OpenAI is available as a secondary provider. You can route specific tasks to it by setting the model to openai-codex/gpt-5.3-codex on individual cron jobs or sub-agents.

Pro tip from Caleb Hodges (OpenClaw founder): Always do config changes from VS Code terminal or a regular terminal window, not from within OpenClaw itself. Avoids the "self-surgery" problem where a restart during config changes locks you out.

3. Kill Unnecessary Heartbeats

Estimated savings: $50-150/month

Heartbeats are when your agent wakes up periodically to check on things. Email, calendar, weather, notifications. Every heartbeat is an API call. If your agent is waking up every 15 minutes to check 4 different things, that's almost 400 API calls a day just from heartbeats.

Most of those checks find nothing new. You're paying for the agent to wake up, read its context, think about it, and say "nothing to report." Over and over.

How to fix it:

Option A: Empty your heartbeat file when nothing needs monitoring. Tell your agent:

Clear out HEARTBEAT.md. I don't need periodic checks right now.
Only add tasks back when there's something specific to monitor.

Option B: Batch your checks. Instead of separate heartbeats for email, calendar, and weather, combine them:

Update HEARTBEAT.md to batch all periodic checks into one pass.
Check email, calendar, and weather in a single heartbeat instead
of separate ones. Only run checks 2-3 times per day, not every
15 minutes.

Option C: Replace heartbeat polling with cron jobs for specific times:

Instead of using heartbeats to check my calendar every 30 minutes,
create a cron job that checks at 8am and 2pm only. Use Sonnet for it.

The math: An empty heartbeat file costs almost zero tokens (agent just replies "HEARTBEAT_OK"). A full heartbeat with 4 checks on Opus every 15 minutes can burn $3-5/day in tokens. That's $90-150/month for... checking if you have new email.

4. Cron Jobs Over Constant Polling

Estimated savings: $50-100/month

This is related to heartbeats but broader. Anywhere your agent is checking something repeatedly, replace it with a scheduled job that fires at exactly the right time.

The difference:

Polling: "Check every 30 minutes if something changed" = 48 API calls/day
Cron: "Run once at 8:00 AM" = 1 API call/day

How to set it up:

Tell your agent to create cron jobs for your recurring tasks:

Create a cron job that runs every weekday at 8:30am CT.
It should check my email for anything urgent and send me a
WhatsApp summary. Use Sonnet as the model.

Create a cron job that runs every Monday at 9am CT.
It should pull my calendar for the week and send me an overview.
Use Sonnet.

Create a daily cron job at 7am that checks the weather forecast
and only messages me if rain is expected. Use Sonnet.

Each of these runs once, does its thing, and stops. No wasted cycles. And since they're isolated sessions, they start with clean context (fewer tokens) and can use a cheaper model.

5. Sub-Agents for Heavy Lifting

Estimated savings: $30-80/month

When you need a big task done (research, writing, building something), your main session has been accumulating conversation history all day. Every message gets more expensive because the model has to process everything that came before it.

A sub-agent starts fresh. Clean context. Just the task description. Way fewer tokens.

How to use them:

You don't need to do anything technical. Just tell your agent:

Spawn a sub-agent to research the top 10 project management tools
for small teams. Have it write up a comparison and send me the results.

Spawn a sub-agent to draft a blog post about [topic].
Use Sonnet for the draft, I'll review it in our main chat.

Spawn a sub-agent to analyze this spreadsheet and pull out
the key trends. Use Sonnet.

Your agent handles the routing. The sub-agent does the work in isolation (cheap, clean context), then reports back to your main conversation.

When to use the expensive model for sub-agents: Only when the task genuinely needs it. Strategy analysis, complex reasoning, anything where quality really matters. For drafts, data pulls, formatting, file operations... Sonnet is fine.

6. Shorter System Prompts and Smarter Context

Estimated savings: $20-60/month

Every message to the AI includes the full system prompt and all your context files. These get sent with every single interaction. Longer prompt = more tokens burned on every reply.

This one sneaks up on you. Your AGENTS.md, SOUL.md, MEMORY.md, and other workspace files grow over time. Each line is tokens you pay for on every message.

How to audit it:

How big are my workspace context files? Show me the token count
for AGENTS.md, SOUL.md, MEMORY.md, TOOLS.md, and any other files
loaded into your system prompt.

Review my MEMORY.md and flag anything outdated or no longer relevant.
Trim what we don't need anymore.

Are there sections of AGENTS.md that could be shorter without
losing important instructions? Suggest cuts.

Rules of thumb:

MEMORY.md should be curated highlights, not a raw log of everything that ever happened
Move completed project details to archive files that aren't loaded every session
Keep TOOLS.md to stuff you actually use regularly
If a section hasn't been relevant in 2+ weeks, it probably doesn't need to be in the always-loaded files

7. Local Tools Over Paid APIs

Estimated savings: $10-30/month

Some things don't need a paid API at all. Check what your agent is using and swap in free alternatives where possible.

Memory search (embeddings):

What provider are you using for memory search embeddings?
If it's a paid API, can we switch to local embeddings?

To switch to local embeddings, set in your config:

openclaw config set memorySearch.provider local

Web search:

Brave Search has a free tier of 2,000 requests/month. That's plenty for most people. If you're using Perplexity or another paid search, consider switching:

Are we using the Brave free tier for web search?
If not, let's switch to save on search API costs.

Web fetch:

The built-in web fetcher works fine for most pages without Firecrawl (paid). Unless you're scraping JavaScript-heavy sites regularly, you probably don't need it.

8. Monitor Your Spending

Estimated savings: Prevents waste (not a direct cut, but catches problems)

You can't optimize what you can't see. I didn't realize how bad my spending was until I actually looked at the numbers.

Commands to check your usage:

In chat with your agent:

/status

Shows your current model, context size, and estimated cost for the last reply.

/usage full

Appends cost footers to every reply so you can see what each message costs.

From terminal:

openclaw status --usage

Shows provider usage windows and quota snapshots.

Tell your agent to track costs:

Start noting the estimated daily API cost in your daily memory file.
I want to see the trend over the next week.

Check your Anthropic dashboard (console.anthropic.com) and OpenAI dashboard (platform.openai.com) weekly. Set a budget alert if your provider supports it.

The Full Picture

Here's what my stack looks like now vs before:

Before (everything on Opus): ~$1,200/month

After optimization:

Model tiering (Sonnet for cron/sub-agents): $400-500 saved
ChatGPT subscription as secondary provider: $100-200 saved
Killing unnecessary heartbeats: $50-150 saved
Cron jobs replacing constant polling: $50-100 saved
Sub-agents for heavy tasks: $30-80 saved
Trimming system prompt/context: $20-60 saved
Local tools over paid APIs: $10-30 saved

New total: ~$450-550/month (roughly 55-60% reduction)

And honestly? The agent performs better now. The expensive model only kicks in when it actually matters. Everything else runs leaner and faster because it's not overthinking simple tasks.

The Bottom Line

You don't have to choose between a smart agent and an affordable one. You just have to stop using the expensive model for everything.

Most AI agent users I've talked to are running one model for all tasks. That's like driving a Ferrari to get groceries. It works, but you're burning premium gas for no reason.

Start with #1 (model tiering) and #2 (connect your ChatGPT subscription). Those two alone will cut your bill in half. Then work down the list as you feel like it.

The goal isn't to be cheap. It's to be smart about where the money goes.