OpenClaw 400 Loop: How Parallel Tool Calls Corrupt Your Session
The Nightmare Scenario: Your OpenClaw agent was running multiple tools in parallel — maybe web search + file read + API call all at once — and then something interrupted it. Maybe you sent /stop, maybe the network dropped, maybe it hit a timeout. Now every message you send returns a 400 error. You try /reset but the session is still broken. The 400 persists across restarts. Your agent is stuck in a permanent loop. This is the parallel tool call corruption bug, and this guide covers exactly what happened and every way to escape it.
This is one of the nastiest bugs in OpenClaw's architecture. Unlike most errors that clear on restart, this one creates corruption that survives restarts because the damage is written to the persisted session file. Every time OpenClaw loads the session and tries to send it to the AI API, the malformed history causes the API to return a 400 error. The session is caught in a loop it cannot self-escape.
The bug has been documented in two separate GitHub issues: #28661 ("Parallel tool_use / tool_result mismatch permanently corrupts session state") and #37834 ("Session context corruption: orphaned tool_use ID causes permanent 400 loop after abort"). Both describe the same class of problem from different angles. Together they've received hundreds of reports from users who thought they had bricked their OpenClaw installation but had simply encountered session corruption.
How Parallel Tool Calls Work in OpenClaw
To understand how this breaks, you first need to understand how tool calls work. When you give OpenClaw a complex task — like "research these 5 companies and summarize their financials" — the AI model may decide to execute multiple tool calls simultaneously to save time. Each tool call has a unique ID and follows a strict pairing requirement:
// Message 1: Model requests 3 parallel tool calls
{
"role": "assistant",
"content": [
{"type": "tool_use", "id": "tool_abc", "name": "web_search", "input": {...}},
{"type": "tool_use", "id": "tool_def", "name": "read_file", "input": {...}},
{"type": "tool_use", "id": "tool_ghi", "name": "http_request", "input": {...}}
]
}
// Message 2: System provides results for ALL 3 tool calls
{
"role": "user",
"content": [
{"type": "tool_result", "tool_use_id": "tool_abc", "content": "..."},
{"type": "tool_result", "tool_use_id": "tool_def", "content": "..."},
{"type": "tool_result", "tool_use_id": "tool_ghi", "content": "..."}
// All 3 IDs must be accounted for. Missing = 400 error.
]
}
How Corruption Happens
Corruption occurs when the pairing between tool_use and tool_result is broken. This can happen in several ways:
Scenario A: User Sends /stop Mid-Execution
The model has issued 3 tool_use calls. Tool 1 and 2 complete. You send /stop. Tool 3 is aborted. OpenClaw writes tool_result for tools 1 and 2 to session history but never writes a tool_result for tool 3. The session now has an orphaned tool_use ID — a tool that was called but never returned a result.
Session state after abort:
tool_use: id="tool_abc" ← has result ✓
tool_use: id="tool_def" ← has result ✓
tool_use: id="tool_ghi" ← NO RESULT (orphaned) ✗ → 400 error forever
Scenario B: Network Drop During Tool Execution
OpenClaw was executing tools and the network connection dropped. Tool calls were issued to the model, but the results from the tools couldn't be sent back due to the disconnection. When the connection restores, OpenClaw tries to resume but the tool_use IDs are already in history without matching tool_results.
Scenario C: Tool Timeout During Parallel Execution
One of the parallel tools timed out (for example, an HTTP request that took too long). OpenClaw wrote the tool_use to history but the timeout prevented the tool_result from being written. Same orphaned ID problem, different trigger.
Scenario D: ID Mismatch from Race Condition
In high-concurrency scenarios (many sub-agents running in parallel), tool_use IDs can get swapped or duplicated across sessions, causing the wrong tool_result to be paired with a tool_use. The pairing exists but it's incorrect, which causes the API to return a validation error.
Diagnosing the Corruption
Confirm It's Tool Call Corruption
# Step 1: Identify the session file
openclaw session list
# Find the active or most recent session
# Step 2: Look for orphaned tool_use IDs using Python
python3 <<'EOF'
import json
session_file = "/Users/YOU/.openclaw/sessions/SESSION_ID.json"
with open(session_file, "r") as f:
session = json.load(f)
tool_use_ids = set()
tool_result_ids = set()
for msg in session.get("messages", []):
content = msg.get("content", [])
if isinstance(content, list):
for block in content:
if block.get("type") == "tool_use":
tool_use_ids.add(block["id"])
elif block.get("type") == "tool_result":
tool_result_ids.add(block.get("tool_use_id", ""))
orphaned = tool_use_ids - tool_result_ids
if orphaned:
print(f"CORRUPTED: {len(orphaned)} orphaned tool_use IDs found:")
for oid in orphaned:
print(f" - {oid}")
else:
print("Session looks clean. Tool call corruption is not the issue.")
EOF
Fix Method 1: Automated Session Repair Script
This Python script finds all orphaned tool_use IDs in your session file and injects synthetic tool_result entries to "complete" the broken pairs. This allows the API to accept the session history again without losing the surrounding conversation.
Session Repair Script
python3 <<'EOF'
import json, copy
session_file = "/Users/YOU/.openclaw/sessions/SESSION_ID.json"
with open(session_file, "r") as f:
session = json.load(f)
messages = session.get("messages", [])
# Find all tool_use IDs and tool_result IDs
tool_use_ids = {} # id -> message index where it appears
tool_result_ids = set()
for i, msg in enumerate(messages):
content = msg.get("content", [])
if isinstance(content, list):
for block in content:
if block.get("type") == "tool_use":
tool_use_ids[block["id"]] = i
elif block.get("type") == "tool_result":
tool_result_ids.add(block.get("tool_use_id", ""))
orphaned = {k: v for k, v in tool_use_ids.items() if k not in tool_result_ids}
print(f"Found {len(orphaned)} orphaned tool_use IDs")
if not orphaned:
print("No repair needed.")
else:
# Find the last message with a tool_result (or create new message)
# Inject synthetic tool_results for orphaned IDs
for orphan_id, msg_idx in orphaned.items():
print(f"Patching orphaned ID: {orphan_id}")
# Find or create a tool_result message after msg_idx
# Add synthetic error result to "complete" the call
synthetic_result = {
"type": "tool_result",
"tool_use_id": orphan_id,
"content": "[Tool execution was interrupted. Result unavailable.]",
"is_error": True
}
# Insert a new user message with the synthetic result
# after the assistant message that had the tool_use
insert_idx = msg_idx + 1
new_msg = {
"role": "user",
"content": [synthetic_result]
}
messages.insert(insert_idx, new_msg)
session["messages"] = messages
backup_file = session_file.replace(".json", ".backup.json")
with open(backup_file, "w") as f:
json.dump(session, f, indent=2)
with open(session_file, "w") as f:
json.dump(session, f, indent=2)
print(f"Session repaired. Backup saved to: {backup_file}")
EOF
After running this script, restart the gateway: openclaw gateway restart
Fix Method 2: Nuclear Reset (Fastest)
If you don't need to preserve the session content, the fastest fix is a complete session reset. This bypasses the corrupted history entirely and starts fresh. Your memories and configuration are preserved.
Hard Reset Commands
# Method A: Via chat interface
/reset
# Method B: Via CLI (forceful)
openclaw session reset --force
# Method C: Manually delete the session file
openclaw gateway stop
rm ~/.openclaw/sessions/[corrupted-session-id].json
openclaw gateway start
# Method D: Reset all sessions (nuclear option)
openclaw gateway stop
rm ~/.openclaw/sessions/*.json
openclaw gateway start
# Note: Memories are in a different file and NOT deleted by these commands
# ls ~/.openclaw/memories/ - your memories survive
Prevention: Stop This From Happening Again
The best fix is avoiding corruption in the first place. Here are the most effective prevention strategies:
Prevention 1: Never Use /stop During Parallel Tool Execution
The most common trigger is interrupting parallel tool calls mid-execution. Instead of /stop, wait for the current tool batch to complete. If you need to stop urgently, use /stop --after-tools which signals the agent to stop but waits for all current tool calls to complete before halting.
# Safe stop (waits for tool calls to complete):
/stop --after-tools
# Risky stop (aborts mid-execution, may corrupt):
/stop
Prevention 2: Limit Parallel Tool Concurrency
Reduce the number of tools that can run in parallel. This reduces the blast radius if something goes wrong — fewer orphaned IDs to deal with.
// In openclaw.json5
{
"execution": {
"maxParallelTools": 2, // Default may be 5+
"toolTimeout": 30000 // 30 second timeout per tool
}
}
Prevention 3: Enable Checkpoint Saving
Configure OpenClaw to save session checkpoints between tool call batches (not mid-batch). If corruption occurs, you can roll back to the last clean checkpoint instead of losing the entire session.
// In openclaw.json5
{
"session": {
"checkpoints": true,
"checkpointOnToolBatchComplete": true,
"maxCheckpoints": 5
}
}
Prevention 4: Stable Network Connection
Network drops during parallel tool execution are a major cause of orphaned IDs. A stable, high-speed network connection dramatically reduces the risk of mid-execution interruption. This is especially important when running OpenClaw for long-running automated tasks where dozens of parallel tool calls may execute over hours.
Understanding the 400 Loop: Why Restart Doesn't Fix It
A common question is: "Why doesn't restarting the gateway fix this?" The answer is that OpenClaw's session history is persisted to disk. When the gateway restarts, it reloads the session from the file on disk. If the file on disk contains orphaned tool_use IDs, the session will be corrupted after every restart, forever, until the file is repaired or deleted.
Restart 1:
Load session from disk → contains orphaned tool_use IDs
Send to API → API returns 400 ✗
Restart 2:
Load session from disk → SAME FILE, same orphaned IDs
Send to API → API returns 400 ✗
Restart N: (same result forever)
The corruption is in the FILE, not in memory
Restart clears memory, not the file on disk
Identifying Which Session Files Are Corrupted
If you've been running OpenClaw for a while with extensive parallel tool usage, you may have multiple session files that are corrupted. Before running repair scripts, it's worth scanning all session files to get the full picture.
Bulk Session Health Check
python3 <<'EOF'
import json, os, glob
sessions_dir = os.path.expanduser("~/.openclaw/sessions/")
session_files = glob.glob(f"{sessions_dir}*.json")
print(f"Scanning {len(session_files)} session files...\n")
for filepath in sorted(session_files):
try:
with open(filepath, "r") as f:
session = json.load(f)
messages = session.get("messages", [])
tool_use_ids = set()
tool_result_ids = set()
for msg in messages:
content = msg.get("content", [])
if isinstance(content, list):
for block in content:
if block.get("type") == "tool_use":
tool_use_ids.add(block["id"])
elif block.get("type") == "tool_result":
tool_result_ids.add(block.get("tool_use_id", ""))
orphaned = tool_use_ids - tool_result_ids
filename = os.path.basename(filepath)
if orphaned:
print(f"❌ CORRUPTED: {filename} ({len(orphaned)} orphaned IDs)")
else:
print(f"✓ Clean: {filename}")
except Exception as e:
print(f"⚠️ Error reading {os.path.basename(filepath)}: {e}")
EOF
When to Repair vs. When to Reset
Not every corrupted session is worth repairing. Here's a practical framework for deciding:
Repair the session when:
- • You were mid-way through a complex multi-hour task
- • The session contains research or analysis results you need
- • Re-running the task from scratch would take significant time or cost
- • The conversation has critical context built up over many turns
Reset and start fresh when:
- • The task was routine and can easily be restarted
- • Speed of recovery is more important than preserving context
- • The session was near the end anyway and almost nothing is lost
- • You're not comfortable editing JSON files manually
The Deeper Problem: OpenClaw's Tool Result Handling
The underlying issue is that OpenClaw currently doesn't have a graceful abort mechanism for parallel tool calls. When any abort signal arrives (user /stop, timeout, network drop), OpenClaw should ideally write synthetic "tool was aborted" results for all in-progress tool calls before halting. Instead, it halts mid-write, leaving the session in an inconsistent state.
This is a known architectural limitation being addressed in the OpenClaw roadmap. The GitHub issues #28661 and #37834 both have comments from the core team acknowledging that a proper graceful abort mechanism is needed. Until that fix is released, the workarounds above are the practical solution.
For users who run OpenClaw in production environments — handling real business tasks, customer communications, or financial operations — this behavior is unacceptable for fully autonomous operation. The recommended mitigation is to combine the prevention strategies above with a monitoring setup that alerts you when the 400 loop begins, so you can intervene quickly before the corruption persists for hours.
Set Up 400 Loop Monitoring
# Add to your HEARTBEAT.md:
## Health Monitoring
Every 10 minutes, silently check: did the last 3 responses
all return 400 errors? If yes, send me an alert:
"⚠️ OpenClaw may be in a 400 loop. Last 3 responses were all
400 errors. Possible session corruption. Consider /reset."
# Or set up external monitoring via cron:
# cron: every 5 minutes, ping openclaw status endpoint
# if response is 400 three times in a row, send Telegram alert
How Network Stability Prevents This Class of Bug
The most reliable way to prevent parallel tool call corruption is to ensure that when the AI model issues tool calls and OpenClaw begins executing them, the connection remains stable for the entire duration. A dropped connection mid-execution is the most common cause of orphaned tool IDs.
VPN07 provides consistently stable, 1000Mbps connections across 70+ countries. For users running OpenClaw as a 24/7 personal assistant or automation engine, this stability is critical. When a network-sensitive task like parallel API calls is interrupted by an unstable connection, the resulting session corruption can take more time to recover from than the original task would have taken to complete. A stable VPN connection is not a luxury for serious OpenClaw users — it's infrastructure.
Network Stability Checklist for Parallel Tool Users
For home server deployments, use Ethernet not Wi-Fi for long-running agents.
Enable VPN keepalive packets to prevent idle timeouts during long tool executions.
Configure
toolTimeout: 30000 to fail fast rather than hang indefinitely.
Never interrupt parallel execution with a hard stop unless absolutely necessary.
Complete Recovery & Prevention Checklist
Recovery (If Already Corrupted)
openclaw gateway stop
Prevention (Going Forward)
/stop --after-tools instead of /stop
maxParallelTools: 2 to limit concurrency exposure
VPN07 — The Reliable Foundation for OpenClaw
Stop network drops from corrupting your parallel tool executions
The most effective prevention for parallel tool call corruption is a network that never drops mid-execution. VPN07 has been operating for 10 years with proven stability, delivering 1000Mbps bandwidth across 70+ countries. For users running OpenClaw as a 24/7 automation engine, a rock-solid network connection isn't optional — it's the difference between a productive agent and a session repair session. Try risk-free with 30-day money-back guarantee.