OpenClaw 400 Loop: Fix Parallel Tool Session Corruption

The Nightmare Scenario: Your OpenClaw agent was running multiple tools in parallel — maybe web search + file read + API call all at once — and then something interrupted it. Maybe you sent /stop, maybe the network dropped, maybe it hit a timeout. Now every message you send returns a 400 error. You try /reset but the session is still broken. The 400 persists across restarts. Your agent is stuck in a permanent loop. This is the parallel tool call corruption bug, and this guide covers exactly what happened and every way to escape it.

This is one of the nastiest bugs in OpenClaw's architecture. Unlike most errors that clear on restart, this one creates corruption that survives restarts because the damage is written to the persisted session file. Every time OpenClaw loads the session and tries to send it to the AI API, the malformed history causes the API to return a 400 error. The session is caught in a loop it cannot self-escape.

The bug has been documented in two separate GitHub issues: #28661 ("Parallel tool_use / tool_result mismatch permanently corrupts session state") and #37834 ("Session context corruption: orphaned tool_use ID causes permanent 400 loop after abort"). Both describe the same class of problem from different angles. Together they've received hundreds of reports from users who thought they had bricked their OpenClaw installation but had simply encountered session corruption.

How Parallel Tool Calls Work in OpenClaw

To understand how this breaks, you first need to understand how tool calls work. When you give OpenClaw a complex task — like "research these 5 companies and summarize their financials" — the AI model may decide to execute multiple tool calls simultaneously to save time. Each tool call has a unique ID and follows a strict pairing requirement:

// How parallel tool calls are supposed to look in session history

// Message 1: Model requests 3 parallel tool calls
{
  "role": "assistant",
  "content": [
    {"type": "tool_use", "id": "tool_abc", "name": "web_search", "input": {...}},
    {"type": "tool_use", "id": "tool_def", "name": "read_file", "input": {...}},
    {"type": "tool_use", "id": "tool_ghi", "name": "http_request", "input": {...}}
  ]
}

// Message 2: System provides results for ALL 3 tool calls
{
  "role": "user",
  "content": [
    {"type": "tool_result", "tool_use_id": "tool_abc", "content": "..."},
    {"type": "tool_result", "tool_use_id": "tool_def", "content": "..."},
    {"type": "tool_result", "tool_use_id": "tool_ghi", "content": "..."}
    // All 3 IDs must be accounted for. Missing = 400 error.
  ]
}

How Corruption Happens

Corruption occurs when the pairing between tool_use and tool_result is broken. This can happen in several ways:

Scenario A: User Sends /stop Mid-Execution

The model has issued 3 tool_use calls. Tool 1 and 2 complete. You send /stop. Tool 3 is aborted. OpenClaw writes tool_result for tools 1 and 2 to session history but never writes a tool_result for tool 3. The session now has an orphaned tool_use ID — a tool that was called but never returned a result.

Session state after abort:
tool_use:  id="tool_abc" ← has result ✓
tool_use:  id="tool_def" ← has result ✓
tool_use:  id="tool_ghi" ← NO RESULT (orphaned) ✗ → 400 error forever

Scenario B: Network Drop During Tool Execution

OpenClaw was executing tools and the network connection dropped. Tool calls were issued to the model, but the results from the tools couldn't be sent back due to the disconnection. When the connection restores, OpenClaw tries to resume but the tool_use IDs are already in history without matching tool_results.

Scenario C: Tool Timeout During Parallel Execution

One of the parallel tools timed out (for example, an HTTP request that took too long). OpenClaw wrote the tool_use to history but the timeout prevented the tool_result from being written. Same orphaned ID problem, different trigger.

Scenario D: ID Mismatch from Race Condition

In high-concurrency scenarios (many sub-agents running in parallel), tool_use IDs can get swapped or duplicated across sessions, causing the wrong tool_result to be paired with a tool_use. The pairing exists but it's incorrect, which causes the API to return a validation error.

Diagnosing the Corruption

Confirm It's Tool Call Corruption

# Step 1: Identify the session file
openclaw session list
# Find the active or most recent session

# Step 2: Look for orphaned tool_use IDs using Python
python3 <<'EOF'
import json

session_file = "/Users/YOU/.openclaw/sessions/SESSION_ID.json"
with open(session_file, "r") as f:
    session = json.load(f)

tool_use_ids = set()
tool_result_ids = set()

for msg in session.get("messages", []):
    content = msg.get("content", [])
    if isinstance(content, list):
        for block in content:
            if block.get("type") == "tool_use":
                tool_use_ids.add(block["id"])
            elif block.get("type") == "tool_result":
                tool_result_ids.add(block.get("tool_use_id", ""))

orphaned = tool_use_ids - tool_result_ids
if orphaned:
    print(f"CORRUPTED: {len(orphaned)} orphaned tool_use IDs found:")
    for oid in orphaned:
        print(f"  - {oid}")
else:
    print("Session looks clean. Tool call corruption is not the issue.")
EOF

#28661

Parallel tool_use mismatch bug

#37834

Orphaned tool_use ID 400 loop

Fix Method 1: Automated Session Repair Script

This Python script finds all orphaned tool_use IDs in your session file and injects synthetic tool_result entries to "complete" the broken pairs. This allows the API to accept the session history again without losing the surrounding conversation.

Session Repair Script

python3 <<'EOF'
import json, copy

session_file = "/Users/YOU/.openclaw/sessions/SESSION_ID.json"

with open(session_file, "r") as f:
    session = json.load(f)

messages = session.get("messages", [])

# Find all tool_use IDs and tool_result IDs
tool_use_ids = {}  # id -> message index where it appears
tool_result_ids = set()

for i, msg in enumerate(messages):
    content = msg.get("content", [])
    if isinstance(content, list):
        for block in content:
            if block.get("type") == "tool_use":
                tool_use_ids[block["id"]] = i
            elif block.get("type") == "tool_result":
                tool_result_ids.add(block.get("tool_use_id", ""))

orphaned = {k: v for k, v in tool_use_ids.items() if k not in tool_result_ids}
print(f"Found {len(orphaned)} orphaned tool_use IDs")

if not orphaned:
    print("No repair needed.")
else:
    # Find the last message with a tool_result (or create new message)
    # Inject synthetic tool_results for orphaned IDs
    for orphan_id, msg_idx in orphaned.items():
        print(f"Patching orphaned ID: {orphan_id}")
        # Find or create a tool_result message after msg_idx
        # Add synthetic error result to "complete" the call
        synthetic_result = {
            "type": "tool_result",
            "tool_use_id": orphan_id,
            "content": "[Tool execution was interrupted. Result unavailable.]",
            "is_error": True
        }
        # Insert a new user message with the synthetic result
        # after the assistant message that had the tool_use
        insert_idx = msg_idx + 1
        new_msg = {
            "role": "user",
            "content": [synthetic_result]
        }
        messages.insert(insert_idx, new_msg)

    session["messages"] = messages
    backup_file = session_file.replace(".json", ".backup.json")
    with open(backup_file, "w") as f:
        json.dump(session, f, indent=2)
    with open(session_file, "w") as f:
        json.dump(session, f, indent=2)
    print(f"Session repaired. Backup saved to: {backup_file}")
EOF

After running this script, restart the gateway: openclaw gateway restart

Fix Method 2: Nuclear Reset (Fastest)

If you don't need to preserve the session content, the fastest fix is a complete session reset. This bypasses the corrupted history entirely and starts fresh. Your memories and configuration are preserved.

Hard Reset Commands

# Method A: Via chat interface
/reset

# Method B: Via CLI (forceful)
openclaw session reset --force

# Method C: Manually delete the session file
openclaw gateway stop
rm ~/.openclaw/sessions/[corrupted-session-id].json
openclaw gateway start

# Method D: Reset all sessions (nuclear option)
openclaw gateway stop
rm ~/.openclaw/sessions/*.json
openclaw gateway start

# Note: Memories are in a different file and NOT deleted by these commands
# ls ~/.openclaw/memories/ - your memories survive

Prevention: Stop This From Happening Again

The best fix is avoiding corruption in the first place. Here are the most effective prevention strategies:

Prevention 1: Never Use /stop During Parallel Tool Execution

The most common trigger is interrupting parallel tool calls mid-execution. Instead of /stop, wait for the current tool batch to complete. If you need to stop urgently, use /stop --after-tools which signals the agent to stop but waits for all current tool calls to complete before halting.

# Safe stop (waits for tool calls to complete):
/stop --after-tools

# Risky stop (aborts mid-execution, may corrupt):
/stop

Prevention 2: Limit Parallel Tool Concurrency

Reduce the number of tools that can run in parallel. This reduces the blast radius if something goes wrong — fewer orphaned IDs to deal with.

// In openclaw.json5
{
  "execution": {
    "maxParallelTools": 2,  // Default may be 5+
    "toolTimeout": 30000    // 30 second timeout per tool
  }
}

Prevention 3: Enable Checkpoint Saving

Configure OpenClaw to save session checkpoints between tool call batches (not mid-batch). If corruption occurs, you can roll back to the last clean checkpoint instead of losing the entire session.

// In openclaw.json5
{
  "session": {
    "checkpoints": true,
    "checkpointOnToolBatchComplete": true,
    "maxCheckpoints": 5
  }
}

Prevention 4: Stable Network Connection

Network drops during parallel tool execution are a major cause of orphaned IDs. A stable, high-speed network connection dramatically reduces the risk of mid-execution interruption. This is especially important when running OpenClaw for long-running automated tasks where dozens of parallel tool calls may execute over hours.

Understanding the 400 Loop: Why Restart Doesn't Fix It

A common question is: "Why doesn't restarting the gateway fix this?" The answer is that OpenClaw's session history is persisted to disk. When the gateway restarts, it reloads the session from the file on disk. If the file on disk contains orphaned tool_use IDs, the session will be corrupted after every restart, forever, until the file is repaired or deleted.

// The 400 error loop — why restart doesn't help

Restart 1:
  Load session from disk → contains orphaned tool_use IDs
  Send to API → API returns 400 ✗

Restart 2:
  Load session from disk → SAME FILE, same orphaned IDs
  Send to API → API returns 400 ✗

Restart N: (same result forever)
  The corruption is in the FILE, not in memory
  Restart clears memory, not the file on disk

Identifying Which Session Files Are Corrupted

If you've been running OpenClaw for a while with extensive parallel tool usage, you may have multiple session files that are corrupted. Before running repair scripts, it's worth scanning all session files to get the full picture.

Bulk Session Health Check

python3 <<'EOF'
import json, os, glob

sessions_dir = os.path.expanduser("~/.openclaw/sessions/")
session_files = glob.glob(f"{sessions_dir}*.json")

print(f"Scanning {len(session_files)} session files...\n")

for filepath in sorted(session_files):
    try:
        with open(filepath, "r") as f:
            session = json.load(f)
        messages = session.get("messages", [])
        tool_use_ids = set()
        tool_result_ids = set()
        for msg in messages:
            content = msg.get("content", [])
            if isinstance(content, list):
                for block in content:
                    if block.get("type") == "tool_use":
                        tool_use_ids.add(block["id"])
                    elif block.get("type") == "tool_result":
                        tool_result_ids.add(block.get("tool_use_id", ""))
        orphaned = tool_use_ids - tool_result_ids
        filename = os.path.basename(filepath)
        if orphaned:
            print(f"❌ CORRUPTED: {filename} ({len(orphaned)} orphaned IDs)")
        else:
            print(f"✓ Clean:      {filename}")
    except Exception as e:
        print(f"⚠️ Error reading {os.path.basename(filepath)}: {e}")
EOF

When to Repair vs. When to Reset

Not every corrupted session is worth repairing. Here's a practical framework for deciding:

Repair the session when:

• You were mid-way through a complex multi-hour task
• The session contains research or analysis results you need
• Re-running the task from scratch would take significant time or cost
• The conversation has critical context built up over many turns

Reset and start fresh when:

• The task was routine and can easily be restarted
• Speed of recovery is more important than preserving context
• The session was near the end anyway and almost nothing is lost
• You're not comfortable editing JSON files manually

The Deeper Problem: OpenClaw's Tool Result Handling

The underlying issue is that OpenClaw currently doesn't have a graceful abort mechanism for parallel tool calls. When any abort signal arrives (user /stop, timeout, network drop), OpenClaw should ideally write synthetic "tool was aborted" results for all in-progress tool calls before halting. Instead, it halts mid-write, leaving the session in an inconsistent state.

This is a known architectural limitation being addressed in the OpenClaw roadmap. The GitHub issues #28661 and #37834 both have comments from the core team acknowledging that a proper graceful abort mechanism is needed. Until that fix is released, the workarounds above are the practical solution.

For users who run OpenClaw in production environments — handling real business tasks, customer communications, or financial operations — this behavior is unacceptable for fully autonomous operation. The recommended mitigation is to combine the prevention strategies above with a monitoring setup that alerts you when the 400 loop begins, so you can intervene quickly before the corruption persists for hours.

Set Up 400 Loop Monitoring

# Add to your HEARTBEAT.md:
## Health Monitoring
Every 10 minutes, silently check: did the last 3 responses
all return 400 errors? If yes, send me an alert:
"⚠️ OpenClaw may be in a 400 loop. Last 3 responses were all
400 errors. Possible session corruption. Consider /reset."

# Or set up external monitoring via cron:
# cron: every 5 minutes, ping openclaw status endpoint
# if response is 400 three times in a row, send Telegram alert

How Network Stability Prevents This Class of Bug

The most reliable way to prevent parallel tool call corruption is to ensure that when the AI model issues tool calls and OpenClaw begins executing them, the connection remains stable for the entire duration. A dropped connection mid-execution is the most common cause of orphaned tool IDs.

VPN07 provides consistently stable, 1000Mbps connections across 70+ countries. For users running OpenClaw as a 24/7 personal assistant or automation engine, this stability is critical. When a network-sensitive task like parallel API calls is interrupted by an unstable connection, the resulting session corruption can take more time to recover from than the original task would have taken to complete. A stable VPN connection is not a luxury for serious OpenClaw users — it's infrastructure.

Network Stability Checklist for Parallel Tool Users

✓ Wired connection preferred
For home server deployments, use Ethernet not Wi-Fi for long-running agents.

✓ VPN with keepalive
Enable VPN keepalive packets to prevent idle timeouts during long tool executions.

✓ Set tool timeout
Configure toolTimeout: 30000 to fail fast rather than hang indefinitely.

✓ Use /stop --after-tools
Never interrupt parallel execution with a hard stop unless absolutely necessary.

Complete Recovery & Prevention Checklist

Recovery (If Already Corrupted)

✓ Run the Python diagnosis script to confirm tool call corruption is the issue

✓ Stop the gateway before editing session files: openclaw gateway stop

✓ Back up the session file before any repair attempt

✓ Run the automated repair script if you want to preserve context; otherwise use /reset

✓ Restart gateway after repair and verify with a test message

Prevention (Going Forward)

✓ Always use /stop --after-tools instead of /stop

✓ Set maxParallelTools: 2 to limit concurrency exposure

✓ Enable session checkpoints in your config

✓ Use a stable, high-speed VPN to prevent network drops during tool execution

VPN07 — The Reliable Foundation for OpenClaw

Stop network drops from corrupting your parallel tool executions

The most effective prevention for parallel tool call corruption is a network that never drops mid-execution. VPN07 has been operating for 10 years with proven stability, delivering 1000Mbps bandwidth across 70+ countries. For users running OpenClaw as a 24/7 automation engine, a rock-solid network connection isn't optional — it's the difference between a productive agent and a session repair session. Try risk-free with 30-day money-back guarantee.

$1.5/mo

Starting price

1000Mbps

Max bandwidth

70+

Countries

10 yrs

Operational

Try VPN07 Free View Pricing

OpenClaw Session Broken After Restart: Fix 400 Errors

Another class of 400 errors — caused by thinking blocks after gateway restart.

OpenClaw Making Up Fake Messages: Hallucination Bug

Why OpenClaw generates fake user messages and how to prevent real-world damage.

OpenClaw 400 Loop: How Parallel Tool Calls Corrupt Your Session