Skip to content
AI Lab Notes
Go back

Browser Automation for AI Agents: MCP, Playwright, and Beyond

AI coding agents are increasingly useful for tasks that go beyond editing files — researching documentation, testing web applications, scraping data, and interacting with web-based tools. To do this, they need the ability to control a browser. This guide covers the practical options for browser automation in an AI agent context, from simple MCP integrations to fully autonomous web agents.

Table of contents

Open Table of contents

The Model Context Protocol (MCP)

Before diving into specific tools, it helps to understand MCP — the Model Context Protocol. MCP is an open standard (originally developed by Anthropic, now widely adopted) that lets AI models connect to external tools and data sources through a standardized interface.

Think of MCP as a USB port for AI agents. Instead of each agent needing custom integrations with every tool, an MCP server exposes capabilities through a common protocol. Any MCP-compatible agent can use any MCP server.

For browser automation, this means you configure an MCP server once and every compatible agent (Claude Code, Cursor, Continue, etc.) can use it to control browsers.

{
  "mcpServers": {
    "playwright": {
      "command": "npx",
      "args": ["-y", "@playwright/mcp@latest"]
    }
  }
}

This configuration tells your agent: “You can use this MCP server to control a browser.”

Playwright MCP: The Primary Tool

The Playwright MCP server from Microsoft is the most practical browser automation tool for AI agents. It gives your agent full control of a Chromium, Firefox, or WebKit browser through the MCP protocol.

What It Provides

The server exposes browser actions as MCP tools that your agent can call:

Two Modes of Interaction

Playwright MCP supports two approaches for understanding page content:

Snapshot mode (default) uses the browser’s accessibility tree to create a structured representation of the page. The agent gets a semantic view — “there is a button labeled ‘Submit’, a text input for ‘Email’, a navigation menu with 5 links” — rather than raw HTML. This is reliable, lightweight, and works well for most tasks.

Vision mode adds screenshot capabilities. The agent receives both the accessibility tree and visual screenshots, which helps with layout-dependent interactions, visual verification, and debugging rendering issues.

{
  "mcpServers": {
    "playwright": {
      "command": "npx",
      "args": ["-y", "@playwright/mcp@latest", "--caps", "vision"]
    }
  }
}

Setup

Install the MCP server in your agent’s configuration:

# For Claude Code, add to .claude/settings.json or ~/.claude.json
# For Cursor, add to .cursor/mcp.json
# The config format is the same across agents

Playwright manages its own browser binary. On first run, it downloads a compatible Chromium build:

# If you need to install the browser explicitly
npx playwright install chromium

Practical Example: Testing a Web Application

With Playwright MCP configured, you can ask your agent natural language questions about web pages:

“Navigate to localhost:3000, check if the login form renders correctly, try logging in with test@example.com / password123, and tell me what happens.”

The agent will:

  1. Open the browser
  2. Navigate to the URL
  3. Take a snapshot to understand the page structure
  4. Fill in the form fields
  5. Click the submit button
  6. Report what happened (success page, error message, redirect, etc.)

No Selenium scripts, no CSS selectors — the agent figures out the page structure from the accessibility tree.

Chrome DevTools MCP: Live Browser Debugging

The Chrome DevTools MCP server from Google takes a different approach. Instead of launching a new browser, it connects to your running Chrome instance via the Chrome DevTools Protocol (CDP).

What It Provides

When to Use It

Chrome DevTools MCP is best for debugging and analysis of web applications you are actively developing:

Setup

You need Chrome running with remote debugging enabled:

# Launch Chrome with remote debugging
google-chrome --remote-debugging-port=9222

Then configure the MCP server:

{
  "mcpServers": {
    "chrome-devtools": {
      "command": "npx",
      "args": ["-y", "@anthropic-ai/chrome-devtools-mcp@latest"]
    }
  }
}

Playwright MCP vs Chrome DevTools MCP

FeaturePlaywright MCPChrome DevTools MCP
BrowserLaunches its own (headless or headed)Connects to your running Chrome
Best forAutomated testing, scraping, form fillingDebugging, performance analysis
SessionIsolated (clean cookies, no login state)Your live session (logged in, cookies intact)
Page interactionFull (navigate, click, type, screenshot)Limited (mostly read-only analysis)
Performance profilingNoYes (traces, Web Vitals)

Use Playwright MCP as your primary tool for browser automation. Use Chrome DevTools MCP when you need to analyze or debug your currently open browser tabs.

Collaborative Browsing: Sharing Your Browser with an Agent

Sometimes you want the agent to see exactly what you are seeing in your browser — your logged-in sessions, your open tabs, the page you are currently debugging.

Chrome Extension + MCP Approach

Browser extensions like BrowserMCP and mcp-chrome bridge your actual browser to your AI agent:

  1. Install the Chrome extension
  2. Configure the corresponding MCP server
  3. Your agent can now see your open tabs, read page content, take screenshots, and interact with pages
{
  "mcpServers": {
    "mcp-chrome": {
      "command": "npx",
      "args": ["-y", "mcp-chrome-server"]
    }
  }
}

mcp-chrome provides 20+ tools including tab management, page navigation, screenshot capture, script injection, network monitoring, and history navigation. It uses your existing Chrome profile, which means the agent has access to your logged-in sessions.

Security Warning

This is the most powerful and most dangerous browser automation approach. When an agent has access to your live browser:

Mitigations:

Autonomous Web Agents: Browser Use

For tasks that require multi-step autonomous web navigation, Browser Use wraps Playwright in an LLM control loop:

from browser_use import Agent
from langchain_anthropic import ChatAnthropic

agent = Agent(
    task="Find the pricing page for Vercel, extract all plan names and prices",
    llm=ChatAnthropic(model="claude-sonnet-4-5-20250929"),
)
result = await agent.run()

Browser Use takes a natural language task description, opens a browser, and autonomously navigates, clicks, types, and extracts information until the task is complete. Version 2.0 (released January 2026) achieved 83.3% accuracy on the WebVoyager benchmark.

When to use it:

When not to use it:

Workflow Patterns

Pattern 1: Agent Does Research While You Work

The agent opens a headless browser, navigates to documentation or API references, extracts the information it needs, and returns results to the conversation. Your browsing is not interrupted.

This is the default behavior with Playwright MCP — it launches a separate browser instance.

Pattern 2: Inspect What You Are Seeing

You are debugging a web page. You launch Chrome with --remote-debugging-port=9222, open the problematic page, and ask your agent to analyze it via Chrome DevTools MCP. The agent can inspect the DOM, check console errors, run performance traces, and suggest fixes.

Pattern 3: Shared Tab Browsing

You install a browser extension MCP server. The agent can see all your open tabs, switch between them, and interact with pages. Useful for collaborative debugging and guided web tasks where you want to point the agent at specific content.

Pattern 4: Isolated Sandbox Browsing

For interacting with untrusted content or automated testing, run the browser in a Docker container:

# Browserless provides headless Chrome in Docker
docker run -d --name browserless \
  -p 3100:3000 \
  --shm-size=2g \
  ghcr.io/browserless/chromium:latest

Connect Playwright to the containerized browser instead of a local one. The container is isolated — no access to your filesystem, cookies, or sessions.

Desktop GUI Automation (Bonus)

Browser automation covers most needs, but sometimes you need to interact with the desktop itself — clicking buttons in native applications, taking screenshots of the full desktop, or typing into non-browser windows.

On Linux with Wayland (which is the default on most modern distributions), the primary tool for this is ydotool:

sudo apt install ydotool

# Type text into the focused window
ydotool type "Hello World"

# Move mouse and click
ydotool mousemove 100 200
ydotool click 0xC0

ydotool works at the kernel level (/dev/uinput), making it compositor-agnostic — it works on Wayland, X11, and even the Linux framebuffer.

For a more integrated approach, Anthropic’s Computer Use runs a full desktop environment inside a Docker container that Claude can control:

docker run -d \
  --name claude-computer-use \
  -p 8080:8080 \
  -e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY \
  ghcr.io/anthropics/anthropic-quickstarts:computer-use-demo-latest

This gives the agent a complete virtual desktop with mouse and keyboard control, useful for testing native applications or complex GUI workflows.

Choosing the Right Approach

ScenarioRecommended Tool
Test a web app you are buildingPlaywright MCP
Debug a live web pageChrome DevTools MCP
Autonomous multi-step web researchBrowser Use
Read documentation during codingPlaywright MCP
Share your browser session with agentmcp-chrome extension
Interact with untrusted contentDockerized browser (Browserless)
Automate native desktop applicationsydotool or Anthropic Computer Use

Start with Playwright MCP — it covers the majority of use cases, runs in isolation by default, and works with every major AI coding agent. Add the other tools as specific needs arise.


Share this post on:

Previous Post
Hello World: Welcome to AI Lab Notes
Next Post
Homepage Dashboards for Self-Hosters: Comparing 8 Tools