Tools & releases

Dupehound: Offline and Deterministic Code Duplicate Detector for Agentic Codebases

June 13, 2026 6 min read

Curated by Oleksandr Kuzmenko, AI Product EngineerUpdated June 13, 2026Sources cited on every story

AI-assisted · editor-reviewedHow we use AI

Tools & releases

Dupehound is a fast, local command line interface tool that uses Abstract Syntax Tree structure fingerprinting to catch duplicate functions written by AI agents. By integrating it into continuous integration pipelines or feeding its output back to Large Language Models, developers can prevent code duplication and context bloat.

Impact: High

Why it matters

It solves the agent-induced code-bloat problem locally and deterministically without wasting API tokens or relying on heavy machine learning models.

TL;DR

01Solves AI agent code-bloat by structurally fingerprinting codebases using tree-sitter ASTs.
02Runs entirely offline and deterministically, scanning millions of lines in seconds (3.6s for VS Code).
03Integrates into CI via pre-commit hooks or GitHub Actions to block duplicate PRs.
04Feeds structural warnings directly to coding agents via CLAUDE.md to enforce code reuse.

Key facts

Supported Languages: TypeScript, TSX, JavaScript, Python, Rust, Go, Java, Ruby, Swift
Scan Speed (VS Code 2.97M lines): 3.6s on a standard laptop
Minimum Token Threshold: 40 normalized tokens per function
Exit Codes for CI Check: 0 clean, 1 findings, 2 error

AST-Based Structural Fingerprinting

Unlike text-based search engines, dupehound drops comments, replaces identifiers, strings, and numbers with sentinels, and analyzes the underlying abstract syntax tree. It uses k-grams of 10 tokens with rolling hashes and robust winnowing to guarantee that any shared sequence of 17 normalized tokens is caught. Similarity is calculated using exact Jaccard index, and matching clusters are generated via union-find.

CLI Commands and Integration

Dupehound provides three core commands:

dupehound scan [path] scans a directory, ranks duplicate clusters by deletable lines, and outputs a 'slop score' representing the percentage of redundant code.
dupehound history reads git blobs directly from the object database without checking out files, mapping out exactly when duplication spiked over time.
dupehound check operates as a CI gate or pre-commit hook. It indexes the codebase at the base git revision and analyzes only the newly added or modified functions, exiting with code 1 upon discovering duplicates.

Prompting Agents for Code Reuse

To prevent agents from generating duplicate code, developers can pipeline the output of dupehound check directly to their agent. Placing the output or guidelines within a CLAUDE.md or AGENTS.md file forces the agent to inspect the existing original function and refactor its code to reuse it rather than committing new redundant blocks.

Try it in 2 minutes

brew install rafaelpta/dupehound/dupehound
dupehound scan .
dupehound check

bash

✓ When to use

When working heavily with agentic integrated development environments like Claude Code, Cursor, or specialized code generation agents.
When you need a deterministic, reproducible merge gate for CI without network or API key dependencies.
When refactoring a large legacy codebase written in one of the supported languages.

What to do today

Install dupehound locally via Homebrew: `brew install rafaelpta/dupehound/dupehound`
Run `dupehound scan .` on your active project to check your current 'slop score'.
Configure a pre-commit hook or GitHub Action using `dupehound check` to fail on duplicate logic.
Update your `CLAUDE.md` or `AGENTS.md` instruction files to consume check logs for code reuse.

#dupehound#Claude Code #Cursor#tree-sitter

Sources

dupehound GitHub Repository

ShareShare on X Share on LinkedIn

Dupehound: Offline and Deterministic Code Duplicate Detector for Agentic Codebases

June 13, 2026 6 min read

Curated by Oleksandr Kuzmenko, AI Product EngineerUpdated June 13, 2026Sources cited on every story

AI-assisted · editor-reviewedHow we use AI

Tools & releases

Impact: High

Why it matters

It solves the agent-induced code-bloat problem locally and deterministically without wasting API tokens or relying on heavy machine learning models.

TL;DR

01Solves AI agent code-bloat by structurally fingerprinting codebases using tree-sitter ASTs.
02Runs entirely offline and deterministically, scanning millions of lines in seconds (3.6s for VS Code).
03Integrates into CI via pre-commit hooks or GitHub Actions to block duplicate PRs.
04Feeds structural warnings directly to coding agents via CLAUDE.md to enforce code reuse.

Key facts

Supported Languages: TypeScript, TSX, JavaScript, Python, Rust, Go, Java, Ruby, Swift
Scan Speed (VS Code 2.97M lines): 3.6s on a standard laptop
Minimum Token Threshold: 40 normalized tokens per function
Exit Codes for CI Check: 0 clean, 1 findings, 2 error

AST-Based Structural Fingerprinting

CLI Commands and Integration

Dupehound provides three core commands:

dupehound scan [path] scans a directory, ranks duplicate clusters by deletable lines, and outputs a 'slop score' representing the percentage of redundant code.
dupehound history reads git blobs directly from the object database without checking out files, mapping out exactly when duplication spiked over time.
dupehound check operates as a CI gate or pre-commit hook. It indexes the codebase at the base git revision and analyzes only the newly added or modified functions, exiting with code 1 upon discovering duplicates.

Prompting Agents for Code Reuse

Try it in 2 minutes

brew install rafaelpta/dupehound/dupehound
dupehound scan .
dupehound check

bash

✓ When to use

When working heavily with agentic integrated development environments like Claude Code, Cursor, or specialized code generation agents.
When you need a deterministic, reproducible merge gate for CI without network or API key dependencies.
When refactoring a large legacy codebase written in one of the supported languages.

What to do today

Install dupehound locally via Homebrew: `brew install rafaelpta/dupehound/dupehound`
Run `dupehound scan .` on your active project to check your current 'slop score'.
Configure a pre-commit hook or GitHub Action using `dupehound check` to fail on duplicate logic.
Update your `CLAUDE.md` or `AGENTS.md` instruction files to consume check logs for code reuse.

#dupehound#Claude Code #Cursor#tree-sitter

Sources

dupehound GitHub Repository

Dupehound: Offline and Deterministic Code Duplicate Detector for Agentic Codebases

AST-Based Structural Fingerprinting

CLI Commands and Integration

Prompting Agents for Code Reuse

Related stories

Get the morning AI brief

Dupehound: Offline and Deterministic Code Duplicate Detector for Agentic Codebases

AST-Based Structural Fingerprinting

CLI Commands and Integration

Prompting Agents for Code Reuse

Related stories

Get the morning AI brief