Alibaba Open-Sources Page Agent for Direct Client-Side Document Object Model Web Automation
Alibaba has open-sourced Page Agent, a client-side TypeScript library that executes natural language browser commands directly in the live page. It reads the DOM as text via DOM dehydration, bypassing external drivers like Playwright.
Impact: Medium
Why it matters
Traditional browser automation tools drive browsers externally, which incurs rendering and processing overhead. Running directly in the browser's context reduces complexity and provides a direct path to build secure, context-aware in-app copilots.
TL;DR
- 01Page Agent runs inside the webpage as plain JavaScript and reads the live DOM as text.
- 02DOM dehydration compresses raw HTML into a FlatDomTree to make execution cheaper and more precise.
- 03Since it executes inside the client session, it inherits existing user authentication and cookies without backend overhead.
Key facts
- License
- MIT
- Primary language
- TypeScript
- Compatible endpoints
- Any OpenAI-compatible API
How DOM Dehydration Works
Raw HTML is verbose and expensive for LLMs. Page Agent solves this by scanning the Document Object Model (DOM) and building a FlatDomTree containing only interactive nodes (buttons, inputs, links). Every interactive item gets indexed, which allows the text-only LLM to emit precise actions based on element indices instead of complex selector strings.
Session Native Execution
Since the agent executes in the browser's context:
- Session Inheritance: Users remain authenticated, preserving cookies and existing session state seamlessly.
- Visual Feedback: The agent supports optional visual feedback during execution through a SimulatorMask component.
Enterprise Security Measures
Page Agent integrates operation allowlists, letting you lock down which actions the agent can perform. It also supports data-masking patterns to sanitize passwords and sensitive fields before they leave the client browser for the LLM endpoint.
✓ When to use
- When building in-app copilots or multi-step form-filling assistance on web applications you control.
✕ When NOT to use
- When you need to automate across multiple separate windows or browser tabs natively without extensions.
What to do today
- Embed Page Agent into internal dashboards to automate repetitive workflows natively.