Why frontier Anthropic models are performing worse on strict tool calling schemas
Newer models like Opus 4.8 and Sonnet 5 are failing on nested tool arguments by emitting made-up keys. This degradation occurs because their post-training optimizes for Claude Code's highly forgiving client, which silently repairs malformed calls.
Impact: High
Why it matters
You must design simpler, flatter schemas or implement lenient parsers in your agent frameworks to prevent unexpected model-side tool failures.
TL;DR
- 01Newer SOTA models are heavily optimized for the forgiving environment of Claude Code's client.
- 02Your custom agent frameworks must replicate Claude Code's forgiving parsing layer to avoid tool failures.
- 03Stripping thinking blocks from history before sending tool calls can reduce schema failures by half.
Key facts
- Models affected
- Claude Opus 4.8, Sonnet 5
- Failure rate in long history
- Approximately 20%
- Impact of removing thinking blocks
- 50% failure reduction
The Claude Code Forgiveness Trap
Investigation into Anthropic's latest SOTA models, including Opus 4.8 and Sonnet 5, shows a surprising degradation when calling structured tools with nested schemas, such as Pi's edits[] array. The models tend to invent trailing keys inside JSON objects, including type, id, kind, unique, matchCase, and in_file. While the payload remains byte-correct, the schema validation fails.
This behavior is highly context-dependent. It does not appear in single-turn prompts but manifests in long agentic transcripts where the model has processed multiple file reads and diagnostics. Stripping thinking blocks from the history reduces the failure rate by 50%.
How Claude Code Handles Slop
Claude Code's internal client is incredibly forgiving. Analysis of its minified codebase reveals that it actively parses and repairs incoming errors:
- Detects leaked
<invoke>markup in visible text and retries. - Runs a state machine to push back on bad calls.
- Restores broken Unicode sequences and lone surrogates.
- Maps aliases automatically: accepts
old_stralongsideold_string, andnew_strwithnew_string. - Silently filters out unexpected keys and bypasses strict schemas to avoid Anthropic's schema complexity limits.
Because the reinforcement learning loop occurs within this forgiving ecosystem, the models are trained to expect silent client-side corrections.
Try it in 2 minutes
const cleanArgs = (args) => {
const path = args.path || args.file_path;
const old_string = args.old_str || args.old_string;
const new_string = args.new_str || args.new_string;
return { file_path: path, old_string, new_string };
};javascript
✓ When to use
- When building agentic workflows where tool invocation resilience is critical.
- When transitioning from old Anthropic models to Opus 4.8 or Sonnet 5.
✕ When NOT to use
- Not applicable if your agent only relies on basic single-turn text completions without structured tools.
What to do today
- Simplify custom tool schemas to keep them flat and avoid nested objects inside arrays.
- Implement silent filtering of unexpected JSON keys in your agent's API response parser.
- Test disabling strict mode for Anthropic tool calls if experiencing model complexity errors.
Sources