Ornith-1.0: Self-Scaffolding Open-Source Models for Agentic Coding Tasks

Models & research

June 29, 2026 3 min read

Curated by Oleksandr Kuzmenko, AI Product EngineerUpdated June 29, 2026Sources cited on every story

AI-assisted · editor-reviewedHow we use AI

Ornith-1.0: Self-Scaffolding Open-Source Models for Agentic Coding Tasks

Deep Reinforce has introduced Ornith-1.0, a self-improving family of models (9B to 397B parameters) designed for agentic coding. By co-evolving task-specific scaffolds with the model's policy, it achieves competitive performance on coding benchmarks.

Why it matters

It moves away from fixed human-designed harnesses, allowing models to autonomously develop the orchestration logic needed for complex coding tasks.

TL;DR

01Self-improving scaffold architecture.
02Reduces reliance on human-designed test harnesses.
03Multi-layered approach to prevent reward hacking.

Self-Improving Scaffold Co-Evolution

Ornith-1.0 uses a training framework where scaffolding co-evolves with the policy. During RL, the model proposes a task-specific scaffold, then generates a solution rollout conditioned on it. Rewards optimize both the orchestrator and executor, leading to autonomous strategy emergence.

Mitigating Reward Hacking

To prevent reward hacking, Ornith-1.0 uses three isolation layers: an immutable outer trust boundary, a deterministic monitor, and a frozen LLM judge that acts as a veto.

#Gemma 4#Qwen 3.5#Terminal-Bench 2.1#SWE-Bench Verified

ShareShare on X Share on LinkedIn

Ornith-1.0: Self-Scaffolding Open-Source Models for Agentic Coding Tasks

Self-Improving Scaffold Co-Evolution

Mitigating Reward Hacking

Related stories

Get the morning AI brief

Ornith-1.0: Self-Scaffolding Open-Source Models for Agentic Coding Tasks

Self-Improving Scaffold Co-Evolution

Mitigating Reward Hacking

Related stories

Get the morning AI brief