AI coding agents are increasingly useful for tasks that go beyond editing files — researching documentation, testing web applications, scraping data, and interacting with web-based tools. To do this, they need the ability to control a browser. This guide covers the practical options for browser automation in an AI agent context, from simple MCP integrations to fully autonomous web agents.
Table of contents
Open Table of contents
The Model Context Protocol (MCP)
Before diving into specific tools, it helps to understand MCP — the Model Context Protocol. MCP is an open standard (originally developed by Anthropic, now widely adopted) that lets AI models connect to external tools and data sources through a standardized interface.
Think of MCP as a USB port for AI agents. Instead of each agent needing custom integrations with every tool, an MCP server exposes capabilities through a common protocol. Any MCP-compatible agent can use any MCP server.
For browser automation, this means you configure an MCP server once and every compatible agent (Claude Code, Cursor, Continue, etc.) can use it to control browsers.
{
"mcpServers": {
"playwright": {
"command": "npx",
"args": ["-y", "@playwright/mcp@latest"]
}
}
}
This configuration tells your agent: “You can use this MCP server to control a browser.”
Playwright MCP: The Primary Tool
The Playwright MCP server from Microsoft is the most practical browser automation tool for AI agents. It gives your agent full control of a Chromium, Firefox, or WebKit browser through the MCP protocol.
What It Provides
The server exposes browser actions as MCP tools that your agent can call:
- Navigate to URLs
- Click elements, type text, fill forms
- Take screenshots of pages or specific elements
- Read page content via accessibility tree snapshots (structured, semantic understanding of the page)
- Evaluate JavaScript for advanced interactions
- Manage tabs (open, close, switch between them)
- Handle dialogs (accept/dismiss alerts and confirms)
- Monitor network requests and console messages
Two Modes of Interaction
Playwright MCP supports two approaches for understanding page content:
Snapshot mode (default) uses the browser’s accessibility tree to create a structured representation of the page. The agent gets a semantic view — “there is a button labeled ‘Submit’, a text input for ‘Email’, a navigation menu with 5 links” — rather than raw HTML. This is reliable, lightweight, and works well for most tasks.
Vision mode adds screenshot capabilities. The agent receives both the accessibility tree and visual screenshots, which helps with layout-dependent interactions, visual verification, and debugging rendering issues.
{
"mcpServers": {
"playwright": {
"command": "npx",
"args": ["-y", "@playwright/mcp@latest", "--caps", "vision"]
}
}
}
Setup
Install the MCP server in your agent’s configuration:
# For Claude Code, add to .claude/settings.json or ~/.claude.json
# For Cursor, add to .cursor/mcp.json
# The config format is the same across agents
Playwright manages its own browser binary. On first run, it downloads a compatible Chromium build:
# If you need to install the browser explicitly
npx playwright install chromium
Practical Example: Testing a Web Application
With Playwright MCP configured, you can ask your agent natural language questions about web pages:
“Navigate to localhost:3000, check if the login form renders correctly, try logging in with test@example.com / password123, and tell me what happens.”
The agent will:
- Open the browser
- Navigate to the URL
- Take a snapshot to understand the page structure
- Fill in the form fields
- Click the submit button
- Report what happened (success page, error message, redirect, etc.)
No Selenium scripts, no CSS selectors — the agent figures out the page structure from the accessibility tree.
Chrome DevTools MCP: Live Browser Debugging
The Chrome DevTools MCP server from Google takes a different approach. Instead of launching a new browser, it connects to your running Chrome instance via the Chrome DevTools Protocol (CDP).
What It Provides
- Performance traces — Core Web Vitals, JavaScript execution costs, layout shifts
- Console messages with source-mapped stack traces
- Network request analysis — request/response headers, timing, payload sizes
- Screenshot capture of the current page
- DOM inspection from your live browser session
When to Use It
Chrome DevTools MCP is best for debugging and analysis of web applications you are actively developing:
- “Why is this page slow? Run a performance trace.”
- “What network requests fire when I click this button?”
- “Are there any console errors on this page?”
Setup
You need Chrome running with remote debugging enabled:
# Launch Chrome with remote debugging
google-chrome --remote-debugging-port=9222
Then configure the MCP server:
{
"mcpServers": {
"chrome-devtools": {
"command": "npx",
"args": ["-y", "@anthropic-ai/chrome-devtools-mcp@latest"]
}
}
}
Playwright MCP vs Chrome DevTools MCP
| Feature | Playwright MCP | Chrome DevTools MCP |
|---|---|---|
| Browser | Launches its own (headless or headed) | Connects to your running Chrome |
| Best for | Automated testing, scraping, form filling | Debugging, performance analysis |
| Session | Isolated (clean cookies, no login state) | Your live session (logged in, cookies intact) |
| Page interaction | Full (navigate, click, type, screenshot) | Limited (mostly read-only analysis) |
| Performance profiling | No | Yes (traces, Web Vitals) |
Use Playwright MCP as your primary tool for browser automation. Use Chrome DevTools MCP when you need to analyze or debug your currently open browser tabs.
Collaborative Browsing: Sharing Your Browser with an Agent
Sometimes you want the agent to see exactly what you are seeing in your browser — your logged-in sessions, your open tabs, the page you are currently debugging.
Chrome Extension + MCP Approach
Browser extensions like BrowserMCP and mcp-chrome bridge your actual browser to your AI agent:
- Install the Chrome extension
- Configure the corresponding MCP server
- Your agent can now see your open tabs, read page content, take screenshots, and interact with pages
{
"mcpServers": {
"mcp-chrome": {
"command": "npx",
"args": ["-y", "mcp-chrome-server"]
}
}
}
mcp-chrome provides 20+ tools including tab management, page navigation, screenshot capture, script injection, network monitoring, and history navigation. It uses your existing Chrome profile, which means the agent has access to your logged-in sessions.
Security Warning
This is the most powerful and most dangerous browser automation approach. When an agent has access to your live browser:
- It can read cookies from any tab, including banking and email sessions
- It can navigate to any URL as you (the authenticated user)
- It can execute JavaScript in the context of any page
- Malicious web content could potentially trick the agent via prompt injection
Mitigations:
- Use a dedicated Chrome profile for agent access (separate from your personal browsing)
- Limit agent access to specific tabs or domains when possible
- Review agent actions before they execute on sensitive pages
Autonomous Web Agents: Browser Use
For tasks that require multi-step autonomous web navigation, Browser Use wraps Playwright in an LLM control loop:
from browser_use import Agent
from langchain_anthropic import ChatAnthropic
agent = Agent(
task="Find the pricing page for Vercel, extract all plan names and prices",
llm=ChatAnthropic(model="claude-sonnet-4-5-20250929"),
)
result = await agent.run()
Browser Use takes a natural language task description, opens a browser, and autonomously navigates, clicks, types, and extracts information until the task is complete. Version 2.0 (released January 2026) achieved 83.3% accuracy on the WebVoyager benchmark.
When to use it:
- Multi-step research tasks (“find and compare X across three websites”)
- Automated data extraction from sites without APIs
- Form filling across multiple pages
- Any task where you would normally point-and-click through a workflow
When not to use it:
- Simple single-page interactions (Playwright MCP is simpler and cheaper)
- Tasks requiring authenticated sessions (security risk)
- High-reliability workflows (autonomous agents can make wrong turns)
Workflow Patterns
Pattern 1: Agent Does Research While You Work
The agent opens a headless browser, navigates to documentation or API references, extracts the information it needs, and returns results to the conversation. Your browsing is not interrupted.
This is the default behavior with Playwright MCP — it launches a separate browser instance.
Pattern 2: Inspect What You Are Seeing
You are debugging a web page. You launch Chrome with --remote-debugging-port=9222, open the problematic page, and ask your agent to analyze it via Chrome DevTools MCP. The agent can inspect the DOM, check console errors, run performance traces, and suggest fixes.
Pattern 3: Shared Tab Browsing
You install a browser extension MCP server. The agent can see all your open tabs, switch between them, and interact with pages. Useful for collaborative debugging and guided web tasks where you want to point the agent at specific content.
Pattern 4: Isolated Sandbox Browsing
For interacting with untrusted content or automated testing, run the browser in a Docker container:
# Browserless provides headless Chrome in Docker
docker run -d --name browserless \
-p 3100:3000 \
--shm-size=2g \
ghcr.io/browserless/chromium:latest
Connect Playwright to the containerized browser instead of a local one. The container is isolated — no access to your filesystem, cookies, or sessions.
Desktop GUI Automation (Bonus)
Browser automation covers most needs, but sometimes you need to interact with the desktop itself — clicking buttons in native applications, taking screenshots of the full desktop, or typing into non-browser windows.
On Linux with Wayland (which is the default on most modern distributions), the primary tool for this is ydotool:
sudo apt install ydotool
# Type text into the focused window
ydotool type "Hello World"
# Move mouse and click
ydotool mousemove 100 200
ydotool click 0xC0
ydotool works at the kernel level (/dev/uinput), making it compositor-agnostic — it works on Wayland, X11, and even the Linux framebuffer.
For a more integrated approach, Anthropic’s Computer Use runs a full desktop environment inside a Docker container that Claude can control:
docker run -d \
--name claude-computer-use \
-p 8080:8080 \
-e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY \
ghcr.io/anthropics/anthropic-quickstarts:computer-use-demo-latest
This gives the agent a complete virtual desktop with mouse and keyboard control, useful for testing native applications or complex GUI workflows.
Choosing the Right Approach
| Scenario | Recommended Tool |
|---|---|
| Test a web app you are building | Playwright MCP |
| Debug a live web page | Chrome DevTools MCP |
| Autonomous multi-step web research | Browser Use |
| Read documentation during coding | Playwright MCP |
| Share your browser session with agent | mcp-chrome extension |
| Interact with untrusted content | Dockerized browser (Browserless) |
| Automate native desktop applications | ydotool or Anthropic Computer Use |
Start with Playwright MCP — it covers the majority of use cases, runs in isolation by default, and works with every major AI coding agent. Add the other tools as specific needs arise.