NVIDIA HORIZON uses git worktrees and prompt caching to build hardware agents
NVIDIA Research has introduced HORIZON, an agentic framework for RTL design that achieves 100% on hardware benchmarks. By hosting problems as git worktrees and using persistent sessions, it achieves a 91% prompt caching rate.
Impact: Medium
Why it matters
Adopt HORIZON's pattern of reusing active model sessions and caching stable codebase sources to keep your multi-step agent costs minimal.
TL;DR
- 01Achieves 100% completion on RTL benchmarks using iterative, state-preserving loops.
- 02Reuses model sessions to run 91% of input tokens through prompt caching, cutting API costs.
- 03Leverages git worktrees and git notes as a native experience buffer instead of an external database.
Key facts
- Prompt Cache Ratio
- 91% (self-reported)
- RTL Suite Pass Rate
- 100% (self-reported)
- Total CVDP Tokens Used
- 203.9M tokens
Architectural Breakdown of HORIZON
HORIZON defines a hardware design task as a project pack $p = (\pi_{agent}, E_p, A_p, \Gamma_p, \Omega_p)$, which represents the agent policy, executable evaluator, acceptance predicate, version control policy, and domain skills. Evaluating RTL designs requires cycle-accurate execution, simulation feedback, and coverage extraction. Because single-turn generations fail to meet these constraints, the loop continuously edits the worktree, running simulations and committing changes only when the acceptance predicate passes.
Real-world Evaluation and Token Performance
Tested across legacy hardware suites (ChipBench, RTLLM-2.0, Verilog-Eval) and CVDP code/verification categories, HORIZON hit a 100% pass rate on all benchmarks.
- Convergence Speed: While Verilog-Eval and RTLLM-2.0 converged within just 2 iterations, complex RTL code completion tasks (CID 002) required up to 82 iterations to resolve bugs.
- Prompt Caching Efficiency: The CVDP evaluation consumed a massive 203.9 million tokens. However, because 91% of these tokens were retrieved from the prompt cache, the actual API financial cost was heavily reduced. This proves that prompt cache design is a core requirement for repository-scale agents.
Try it in 2 minutes
git diff --cached
git commit -m "iter 7: fix full/empty overlap"
git notes add -m "pass=1 mismatches=0"
git log --onelinebash
✓ When to use
- When building autonomous agent setups that edit, run, test, and debug massive local repositories iteratively.
✕ When NOT to use
- When your code changes are small, one-shot generations that do not require extensive execution-based feedback loops.
What to do today
- Incorporate git notes in your testing CI pipelines to log validation metadata directly onto commits.
- Structure agent loops to preserve model session context, minimizing cold-start token bills.
Sources