Skip to content
ATAI Today Brief
HomeNewsConceptsGuidesToolbox
AboutSubscribeUA
Subscribe

AI Today Brief

The daily AI-engineering brief. Built in public. EN · UA.

XTelegramLinkedInYouTubeRSS
NewsConceptsGuidesSubscribeAdvertiseAboutEditorial policyAI disclosurePrivacyTerms

© 2026 AI Today Brief. All rights reserved.

  1. Home/
  2. News/
  3. Token & cost optimization/
  4. NVIDIA HORIZON uses git worktrees and prompt caching to build hardware agents
Token & cost optimization

NVIDIA HORIZON uses git worktrees and prompt caching to build hardware agents

July 4, 2026· 5 min read
OKCurated by Oleksandr Kuzmenko, AI Product Engineer·Updated July 4, 2026·Sources cited on every story
AI-assisted · editor-reviewed·How we use AI
NVIDIA HORIZON uses git worktrees and prompt caching to build hardware agents

NVIDIA Research has introduced HORIZON, an agentic framework for RTL design that achieves 100% on hardware benchmarks. By hosting problems as git worktrees and using persistent sessions, it achieves a 91% prompt caching rate.

Impact: Medium

Why it matters

Adopt HORIZON's pattern of reusing active model sessions and caching stable codebase sources to keep your multi-step agent costs minimal.

TL;DR

  • 01Achieves 100% completion on RTL benchmarks using iterative, state-preserving loops.
  • 02Reuses model sessions to run 91% of input tokens through prompt caching, cutting API costs.
  • 03Leverages git worktrees and git notes as a native experience buffer instead of an external database.

Key facts

Prompt Cache Ratio91% (self-reported)
RTL Suite Pass Rate100% (self-reported)
Prompt Cache Ratio
91% (self-reported)
RTL Suite Pass Rate
100% (self-reported)
Total CVDP Tokens Used
203.9M tokens

Architectural Breakdown of HORIZON

HORIZON defines a hardware design task as a project pack $p = (\pi_{agent}, E_p, A_p, \Gamma_p, \Omega_p)$, which represents the agent policy, executable evaluator, acceptance predicate, version control policy, and domain skills. Evaluating RTL designs requires cycle-accurate execution, simulation feedback, and coverage extraction. Because single-turn generations fail to meet these constraints, the loop continuously edits the worktree, running simulations and committing changes only when the acceptance predicate passes.

Real-world Evaluation and Token Performance

Tested across legacy hardware suites (ChipBench, RTLLM-2.0, Verilog-Eval) and CVDP code/verification categories, HORIZON hit a 100% pass rate on all benchmarks.

  • Convergence Speed: While Verilog-Eval and RTLLM-2.0 converged within just 2 iterations, complex RTL code completion tasks (CID 002) required up to 82 iterations to resolve bugs.
  • Prompt Caching Efficiency: The CVDP evaluation consumed a massive 203.9 million tokens. However, because 91% of these tokens were retrieved from the prompt cache, the actual API financial cost was heavily reduced. This proves that prompt cache design is a core requirement for repository-scale agents.

Try it in 2 minutes

git diff --cached
git commit -m "iter 7: fix full/empty overlap"
git notes add -m "pass=1 mismatches=0"
git log --oneline

bash

✓ When to use

  • When building autonomous agent setups that edit, run, test, and debug massive local repositories iteratively.

✕ When NOT to use

  • When your code changes are small, one-shot generations that do not require extensive execution-based feedback loops.

What to do today

  • →Incorporate git notes in your testing CI pipelines to log validation metadata directly onto commits.
  • →Structure agent loops to preserve model session context, minimizing cold-start token bills.
#NVIDIA HORIZON

Sources

  • NVIDIA HORIZON: Hands-Free RTL Agent
ShareShare on XShare on LinkedIn
← Previous storyReview-flow: Automate 80% of code reviews using Claude Code and Model Context ProtocolNext story →NVIDIA's ASPIRE framework distills validated coding agent fixes into reusable skills

Related stories

  • Token & cost optimizationOptimizing Token Caching to Avoid Unexpected Cloud Large Language Model Costs
  • Token & cost optimizationCutting Claude Code Token Costs with Optical Context Compression
  • Token & cost optimizationPractical Strategies to Optimize Claude Code and Fable Token Burn

Email digest

Get the morning AI brief

One email a day — the stories that matter for engineers, founders and tech leads. Human-edited, with links to primary sources.

  • ✓120+ sources scanned daily
  • ✓Edited by a human
  • ✓1 email per day
  • ✓EN + UA

By subscribing you agree to the privacy policy.