The Rate Limit Cascade: How wakeMode Took Down Five Providers in Ten Seconds

The Rate Limit Cascade

The Restart

I restarted the OpenClaw container. Standard maintenance — updated some config, needed the process to pick up changes. Should take 10 seconds.

What actually happened took considerably longer to understand.

Within a minute, every free-tier LLM provider in my fallback chain was exhausted. Groq hit its 500K token daily limit. Cerebras returned 429s. SambaNova stopped responding. OpenRouter started returning 400 errors. NVIDIA NIM rate-limited.

Five providers. All dead. From a container restart.

The Root Cause: wakeMode

OpenClaw has a config flag called wakeMode. When set to "now", every cron job fires immediately on container startup. The intent is reasonable — if the container’s been down, catch up on missed scheduled work.

I had 13 active cron jobs. Some ran hourly. Some ran every 8 hours. A morning briefing, a site health check, a content scout, a competitive monitor, a watchdog, a dev pipeline check, a security audit, a playground tester, a memory consolidation job…

All 13 fired simultaneously.

Each job hits an LLM provider. Most of them hit the same provider — Groq, because it’s the fastest free tier and was configured as the default. So within seconds, Groq saw 13 concurrent requests from the same API key.

That’s when the cascade started.

The Cascade

Groq hit its daily token limit. The fallback chain kicked in. Next provider: Cerebras. But 13 jobs are all failing over at the same time. Cerebras gets hammered.

Cerebras rate-limits. Next in chain: SambaNova. Same pattern.

SambaNova exhausts. Next: OpenRouter. And here’s where it got weird — OpenRouter started returning 400 Bad Request, not 429 Too Many Requests. The gateway was sending malformed retry requests. Not my code’s fault — it’s an upstream bug in the OpenClaw gateway that only surfaces under high concurrent load. The retry payload gets corrupted when multiple jobs are failing over through the same provider simultaneously.

By the time the cascade reached NVIDIA NIM, the fifth provider in the chain, there was nothing left. Every provider was either exhausted, rate-limited, or rejecting malformed requests.

Total time from restart to complete cascade failure: maybe 10 seconds.

Why 63 Calls Per Day Was Too Many

Even without the cascade, the math didn’t work. 13 cron jobs averaging 4-5 LLM calls each. That’s roughly 63 calls per day. Free-tier providers have generous-sounding limits until you actually calculate what 63 calls looks like in tokens:

  • Groq: 500K tokens/day. 63 calls with moderate context easily eats that.
  • Cerebras: Similar daily limits, undocumented but observable.
  • SambaNova: Token budget that seemed fine until it wasn’t.

I was running right at the edge of every provider’s free tier, every single day. The restart just compressed a day’s worth of calls into 10 seconds.

The Fix: Fewer Jobs, Deeper Fallbacks

Step 1: Reduce the cron jobs. 13 down to 6.

The disabled jobs were mostly redundant. The “watchdog” job checked if services were up — but that’s what Uptime Kuma does, with actual monitoring instead of periodic LLM-powered curl wrappers. The “competitive monitor” and “content scout” were nice-to-haves that burned tokens daily for marginal value.

Six active jobs now:

  • Morning site status (8 AM)
  • Site and services health (every 8 hours)
  • Playground check (every 6 hours)
  • Security audit (11 PM)
  • Weekly summary (Sundays)
  • Integration connectivity (Wednesdays)

Nine LLM calls per day instead of 63.

Step 2: Expand fallback chains. Every agent tier now has 5+ providers configured. If Groq’s down, it tries Cerebras, then SambaNova, then OpenRouter with Gemini, then Qwen, then NVIDIA NIM. Deep enough that no single provider outage causes a visible failure.

Step 3: Don’t fire everything at once. The wakeMode: "now" flag is… still there. I should probably change it. But reducing the job count means even if all 6 fire simultaneously, it’s 6 calls instead of 13. The providers can handle that.

The Deeper Problem

Free-tier AI providers are incredible. Groq gives you 500,000 tokens per day for free. Cerebras, SambaNova, NVIDIA NIM all have generous allocations. OpenRouter has a free tier for many models.

But “free” comes with constraints that are easy to forget:

Daily limits are per-key, not per-request. A burst of 13 requests doesn’t just hit a rate limit — it can exhaust an entire day’s allocation in seconds.

Fallback chains can amplify failure. If your primary provider goes down and your fallback is another free tier, you’re just shifting the load. And if all 5 fallbacks are free tiers, you can cascade through all of them before any single one has recovered.

Provider rate limiting is inconsistent. Groq returns clean 429s. OpenRouter returns 400s with malformed error bodies. Some providers silently drop requests. Some return 200 with empty responses. Your retry logic needs to handle all of these, and “handle” doesn’t mean “retry immediately with the next provider.”

wakeMode is a footgun. Any “fire everything on startup” pattern is dangerous when “everything” means “all the jobs that would normally spread across 24 hours.” Startup burst is a well-known problem in distributed systems. I should’ve known better.

What I’d Do Differently

Stagger the cron jobs on startup. Instead of wakeMode: "now", use wakeMode: "scheduled" and let each job wait for its next natural trigger. Or add a random delay to each job’s first execution — 30 seconds to 5 minutes — so they don’t all fire at once.

Monitor token usage, not just request success. I had health checks that verified “did the job complete?” but not “how many tokens did that cost?” If I’d been tracking daily token burn rate, I would’ve seen the 63-call-per-day number and realized it was right at the limit.

Pick one or two providers for primary use and keep the rest genuinely as fallbacks. Spreading load across 5 free tiers sounds resilient but actually means you’re dependent on 5 services maintaining their free-tier policies. If any one changes their limits, your math breaks.

The Status Now

Six jobs. Nine calls per day. Five-deep fallback chains. No more cascade failures.

It’s probably the most expensive container restart I’ve ever done, measured in burned API credits and debugging time. All because of a config flag that said “yes, please fire everything immediately.”

Lesson learned. The hard way. As usual.