Skip to content
ATAI Today Brief
HomeNewsConceptsGuidesToolbox
AboutSubscribeUA
Subscribe

AI Today Brief

The daily AI-engineering brief. Built in public. EN · UA.

XTelegramLinkedInYouTubeRSS
NewsConceptsGuidesSubscribeAdvertiseAboutEditorial policyAI disclosurePrivacyTerms

© 2026 AI Today Brief. All rights reserved.

  1. Home/
  2. News/
  3. Tools & releases/
  4. Dupehound: Offline and Deterministic Code Duplicate Detector for Agentic Codebases
Tools & releases

Dupehound: Offline and Deterministic Code Duplicate Detector for Agentic Codebases

June 13, 2026· 6 min read
OKCurated by Oleksandr Kuzmenko, AI Product Engineer·Updated June 13, 2026·Sources cited on every story
AI-assisted · editor-reviewed·How we use AI
Tools & releases

Dupehound is a fast, local command line interface tool that uses Abstract Syntax Tree structure fingerprinting to catch duplicate functions written by AI agents. By integrating it into continuous integration pipelines or feeding its output back to Large Language Models, developers can prevent code duplication and context bloat.

Impact: High

Why it matters

It solves the agent-induced code-bloat problem locally and deterministically without wasting API tokens or relying on heavy machine learning models.

TL;DR

  • 01Solves AI agent code-bloat by structurally fingerprinting codebases using tree-sitter ASTs.
  • 02Runs entirely offline and deterministically, scanning millions of lines in seconds (3.6s for VS Code).
  • 03Integrates into CI via pre-commit hooks or GitHub Actions to block duplicate PRs.
  • 04Feeds structural warnings directly to coding agents via CLAUDE.md to enforce code reuse.

Key facts

Supported Languages
TypeScript, TSX, JavaScript, Python, Rust, Go, Java, Ruby, Swift
Scan Speed (VS Code 2.97M lines)
3.6s on a standard laptop
Minimum Token Threshold
40 normalized tokens per function
Exit Codes for CI Check
0 clean, 1 findings, 2 error

AST-Based Structural Fingerprinting

Unlike text-based search engines, dupehound drops comments, replaces identifiers, strings, and numbers with sentinels, and analyzes the underlying abstract syntax tree. It uses k-grams of 10 tokens with rolling hashes and robust winnowing to guarantee that any shared sequence of 17 normalized tokens is caught. Similarity is calculated using exact Jaccard index, and matching clusters are generated via union-find.

CLI Commands and Integration

Dupehound provides three core commands:

  • dupehound scan [path] scans a directory, ranks duplicate clusters by deletable lines, and outputs a 'slop score' representing the percentage of redundant code.
  • dupehound history reads git blobs directly from the object database without checking out files, mapping out exactly when duplication spiked over time.
  • dupehound check operates as a CI gate or pre-commit hook. It indexes the codebase at the base git revision and analyzes only the newly added or modified functions, exiting with code 1 upon discovering duplicates.

Prompting Agents for Code Reuse

To prevent agents from generating duplicate code, developers can pipeline the output of dupehound check directly to their agent. Placing the output or guidelines within a CLAUDE.md or AGENTS.md file forces the agent to inspect the existing original function and refactor its code to reuse it rather than committing new redundant blocks.

Try it in 2 minutes

brew install rafaelpta/dupehound/dupehound
dupehound scan .
dupehound check

bash

✓ When to use

  • When working heavily with agentic integrated development environments like Claude Code, Cursor, or specialized code generation agents.
  • When you need a deterministic, reproducible merge gate for CI without network or API key dependencies.
  • When refactoring a large legacy codebase written in one of the supported languages.

What to do today

  • →Install dupehound locally via Homebrew: `brew install rafaelpta/dupehound/dupehound`
  • →Run `dupehound scan .` on your active project to check your current 'slop score'.
  • →Configure a pre-commit hook or GitHub Action using `dupehound check` to fail on duplicate logic.
  • →Update your `CLAUDE.md` or `AGENTS.md` instruction files to consume check logs for code reuse.
#dupehound#Claude Code#Cursor#tree-sitter

Sources

  • dupehound GitHub Repository
ShareShare on XShare on LinkedIn

Related stories

  • Tools & releasesMoonshot AI Releases Kimi Code K2.7 Open-Source Coding Model
  • Tools & releasesGoogle Sues Cybercrime Group Over Gemini-Assisted Phishing Campaigns
  • Tools & releasesVisa Integrates Payment Tokenization with ChatGPT to Enable Direct Agentic Purchasing
  • Tools & releasesOpenRouter Upgrades Free Tier Limit to 1,000 Requests Per Day

Email digest

Get the morning AI brief

One email a day — the stories that matter for engineers, founders and tech leads. Human-edited, with links to primary sources.

  • ✓120+ sources scanned daily
  • ✓Edited by a human
  • ✓1 email per day
  • ✓EN + UA

By subscribing you agree to the privacy policy.