Skip to content
ATAI Today Brief
HomeNewsConceptsGuidesToolbox
AboutSubscribeUA
Subscribe

AI Today Brief

The daily AI-engineering brief. Built in public. EN · UA.

XTelegramLinkedInYouTubeRSS
NewsConceptsGuidesSubscribeAdvertiseAboutEditorial policyAI disclosurePrivacyTerms

© 2026 AI Today Brief. All rights reserved.

  1. Home/
  2. News/
  3. Agents & MCP/
  4. Why frontier Anthropic models are performing worse on strict tool calling schemas
Agents & MCP

Why frontier Anthropic models are performing worse on strict tool calling schemas

July 5, 2026· 6 min read
OKCurated by Oleksandr Kuzmenko, AI Product Engineer·Updated July 5, 2026·Sources cited on every story
AI-assisted · editor-reviewed·How we use AI
Why frontier Anthropic models are performing worse on strict tool calling schemas

Newer models like Opus 4.8 and Sonnet 5 are failing on nested tool arguments by emitting made-up keys. This degradation occurs because their post-training optimizes for Claude Code's highly forgiving client, which silently repairs malformed calls.

Impact: High

Why it matters

You must design simpler, flatter schemas or implement lenient parsers in your agent frameworks to prevent unexpected model-side tool failures.

TL;DR

  • 01Newer SOTA models are heavily optimized for the forgiving environment of Claude Code's client.
  • 02Your custom agent frameworks must replicate Claude Code's forgiving parsing layer to avoid tool failures.
  • 03Stripping thinking blocks from history before sending tool calls can reduce schema failures by half.

Key facts

Models affected
Claude Opus 4.8, Sonnet 5
Failure rate in long history
Approximately 20%
Impact of removing thinking blocks
50% failure reduction

The Claude Code Forgiveness Trap

Investigation into Anthropic's latest SOTA models, including Opus 4.8 and Sonnet 5, shows a surprising degradation when calling structured tools with nested schemas, such as Pi's edits[] array. The models tend to invent trailing keys inside JSON objects, including type, id, kind, unique, matchCase, and in_file. While the payload remains byte-correct, the schema validation fails.

This behavior is highly context-dependent. It does not appear in single-turn prompts but manifests in long agentic transcripts where the model has processed multiple file reads and diagnostics. Stripping thinking blocks from the history reduces the failure rate by 50%.

How Claude Code Handles Slop

Claude Code's internal client is incredibly forgiving. Analysis of its minified codebase reveals that it actively parses and repairs incoming errors:

  • Detects leaked <invoke> markup in visible text and retries.
  • Runs a state machine to push back on bad calls.
  • Restores broken Unicode sequences and lone surrogates.
  • Maps aliases automatically: accepts old_str alongside old_string, and new_str with new_string.
  • Silently filters out unexpected keys and bypasses strict schemas to avoid Anthropic's schema complexity limits.

Because the reinforcement learning loop occurs within this forgiving ecosystem, the models are trained to expect silent client-side corrections.

Try it in 2 minutes

const cleanArgs = (args) => {
  const path = args.path || args.file_path;
  const old_string = args.old_str || args.old_string;
  const new_string = args.new_str || args.new_string;
  return { file_path: path, old_string, new_string };
};

javascript

✓ When to use

  • When building agentic workflows where tool invocation resilience is critical.
  • When transitioning from old Anthropic models to Opus 4.8 or Sonnet 5.

✕ When NOT to use

  • Not applicable if your agent only relies on basic single-turn text completions without structured tools.

What to do today

  • →Simplify custom tool schemas to keep them flat and avoid nested objects inside arrays.
  • →Implement silent filtering of unexpected JSON keys in your agent's API response parser.
  • →Test disabling strict mode for Anthropic tool calls if experiencing model complexity errors.
#Claude Code#Claude Opus#Claude Sonnet

Sources

  • Better Models: Worse Tools
ShareShare on XShare on LinkedIn
Next story →How Claude Fable shipped sqlite-utils 4.0rc2 for 150 dollars

Related stories

  • Agents & MCPReview-flow: Automate 80% of code reviews using Claude Code and Model Context Protocol
  • Agents & MCPArkon: Self-Hosted Enterprise Knowledge Hub and Model Context Protocol Server
  • Agents & MCPApache Magpie Introduces Vendor-Neutral Agent Recipes for Repository Maintainers
  • Agents & MCPSimon Willison Launches llm-coding-agent Python Library via Claude Code Spec-Driven TDD

Email digest

Get the morning AI brief

One email a day — the stories that matter for engineers, founders and tech leads. Human-edited, with links to primary sources.

  • ✓120+ sources scanned daily
  • ✓Edited by a human
  • ✓1 email per day
  • ✓EN + UA

By subscribing you agree to the privacy policy.