Skip to content
ATAI Today Brief
HomeNewsConceptsGuidesToolbox
AboutSubscribeUA
Subscribe

AI Today Brief

The daily AI-engineering brief. Built in public. EN · UA.

XTelegramLinkedInYouTubeRSS
NewsConceptsGuidesSubscribeAdvertiseAboutEditorial policyAI disclosurePrivacyTerms

© 2026 AI Today Brief. All rights reserved.

  1. Home/
  2. News/
  3. Agents & MCP/
  4. MosaicLeaks: Detecting Privacy Leaks in Research Agents
Agents & MCP

MosaicLeaks: Detecting Privacy Leaks in Research Agents

June 19, 2026· 6 min read
OKCurated by Oleksandr Kuzmenko, AI Product Engineer·Updated June 19, 2026·Sources cited on every story
AI-assisted · editor-reviewed·How we use AI
MosaicLeaks: Detecting Privacy Leaks in Research Agents

Researchers found that agents often leak internal data by interleaving private context into public web search queries. The new PA-DR training method reduces this leakage by penalizing queries that reveal proprietary fragments.

Impact: Medium

Why it matters

If your agent architecture involves local document retrieval combined with web search, it is likely leaking snippets of your private data to search providers.

TL;DR

  • 01Informing agents about specific local facts during search increases the likelihood of data leakage.
  • 02Simple system prompts to 'be private' are ineffective and reduce task accuracy.
  • 03PA-DR training balances task performance with privacy protection by evaluating queries individually.

Key facts

Original strict chain success48.7%
PA-DR strict chain success58.7%
Original leakage rate34.0%
PA-DR leakage rate9.9%
Original strict chain success
48.7%
PA-DR strict chain success
58.7%
Original leakage rate
34.0%
PA-DR leakage rate
9.9%

The Mechanism of Mosaic Leakage

The core issue is the interleaving of local knowledge with external search. When an agent queries a search engine for 'MediConn 70% migration', it might appear innocuous, but when aggregated with previous queries about security disclosures, an adversary can deduce internal business facts. Researchers identified three levels of leakage: Intent (inferring research goals), Answer (answering private questions), and Full-Information (stating true private facts without prior context).

Challenges in Mitigation

Simply prompting the agent to 'not leak data' is insufficient. For models like Qwen3-4B, this intervention often decreases task performance (strict chain success dropping from 48.7% to 44.5%) without providing consistent protection. Training for higher task performance actually worsens the problem; models learn to include more specific context in search queries to improve retrieval results, creating richer 'leakage' paths.

The PA-DR Solution

Privacy-Aware Deep Research (PA-DR) uses a granular reward system. It assigns a privacy cost at the exact moment of planning. If a query is estimated to leak information directly or contribute to a mosaic leak, the model is penalized. This allows the model to find a balance between retrieval quality and data safety.

✓ When to use

  • For agents performing deep research on proprietary internal documents.
  • In enterprise systems requiring strict data isolation.

What to do today

  • →Audit the outbound search queries of your research agents for sensitive entity references.
  • →Limit the amount of context passed to web search functions in agent tools.
#Qwen3-4B

Sources

  • MosaicLeaks: Can your research agent keep a secret?
ShareShare on XShare on LinkedIn

Related stories

  • Agents & MCPDynamic Tool Retrieval for AI Agents: Solving Context Bloat with Vector Search

Email digest

Get the morning AI brief

One email a day — the stories that matter for engineers, founders and tech leads. Human-edited, with links to primary sources.

  • ✓120+ sources scanned daily
  • ✓Edited by a human
  • ✓1 email per day
  • ✓EN + UA

By subscribing you agree to the privacy policy.