Why frontier Anthropic models are performing worse on strict tool calling schemas

Agents & MCP

July 5, 2026 6 min read

Curated by Oleksandr Kuzmenko, AI Product EngineerUpdated July 5, 2026Sources cited on every story

AI-assisted · editor-reviewedHow we use AI

Why frontier Anthropic models are performing worse on strict tool calling schemas

Newer models like Opus 4.8 and Sonnet 5 are failing on nested tool arguments by emitting made-up keys. This degradation occurs because their post-training optimizes for Claude Code's highly forgiving client, which silently repairs malformed calls.

Impact: High

Why it matters

You must design simpler, flatter schemas or implement lenient parsers in your agent frameworks to prevent unexpected model-side tool failures.

TL;DR

01Newer SOTA models are heavily optimized for the forgiving environment of Claude Code's client.
02Your custom agent frameworks must replicate Claude Code's forgiving parsing layer to avoid tool failures.
03Stripping thinking blocks from history before sending tool calls can reduce schema failures by half.

Key facts

Models affected: Claude Opus 4.8, Sonnet 5
Failure rate in long history: Approximately 20%
Impact of removing thinking blocks: 50% failure reduction

The Claude Code Forgiveness Trap

Investigation into Anthropic's latest SOTA models, including Opus 4.8 and Sonnet 5, shows a surprising degradation when calling structured tools with nested schemas, such as Pi's edits[] array. The models tend to invent trailing keys inside JSON objects, including type, id, kind, unique, matchCase, and in_file. While the payload remains byte-correct, the schema validation fails.

This behavior is highly context-dependent. It does not appear in single-turn prompts but manifests in long agentic transcripts where the model has processed multiple file reads and diagnostics. Stripping thinking blocks from the history reduces the failure rate by 50%.

How Claude Code Handles Slop

Claude Code's internal client is incredibly forgiving. Analysis of its minified codebase reveals that it actively parses and repairs incoming errors:

Detects leaked <invoke> markup in visible text and retries.
Runs a state machine to push back on bad calls.
Restores broken Unicode sequences and lone surrogates.
Maps aliases automatically: accepts old_str alongside old_string, and new_str with new_string.
Silently filters out unexpected keys and bypasses strict schemas to avoid Anthropic's schema complexity limits.

Because the reinforcement learning loop occurs within this forgiving ecosystem, the models are trained to expect silent client-side corrections.

Try it in 2 minutes

const cleanArgs = (args) => {
  const path = args.path || args.file_path;
  const old_string = args.old_str || args.old_string;
  const new_string = args.new_str || args.new_string;
  return { file_path: path, old_string, new_string };
};

javascript

✓ When to use

When building agentic workflows where tool invocation resilience is critical.
When transitioning from old Anthropic models to Opus 4.8 or Sonnet 5.

✕ When NOT to use

Not applicable if your agent only relies on basic single-turn text completions without structured tools.

What to do today

Simplify custom tool schemas to keep them flat and avoid nested objects inside arrays.
Implement silent filtering of unexpected JSON keys in your agent's API response parser.
Test disabling strict mode for Anthropic tool calls if experiencing model complexity errors.

#Claude Code#Claude Opus#Claude Sonnet

Sources

Better Models: Worse Tools

ShareShare on X Share on LinkedIn

const cleanArgs = (args) => { const path = args.path || args.file_path; const old_string = args.old_str || args.old_string; const new_string = args.new_str || args.new_string; return { file_path: path, old_string, new_string }; };

Why frontier Anthropic models are performing worse on strict tool calling schemas

The Claude Code Forgiveness Trap

How Claude Code Handles Slop

Related stories

Get the morning AI brief

Why frontier Anthropic models are performing worse on strict tool calling schemas

The Claude Code Forgiveness Trap

How Claude Code Handles Slop

Related stories

Get the morning AI brief