MCP vs CLI+SKILLs: when each pattern wins

8 min read · Reviewed May 2026 · architecture

Same Playwright capability, two fundamentally different integration patterns. MCP exposes Playwright as a rich tool surface for long-running agentic loops — persistent state, full accessibility tree, real-time introspection. CLI+SKILLs exposes it as concise commands for high-throughput coding agents — snapshots saved to disk, minimal schema in context, token-efficient. The token cost difference is roughly 4x on equivalent tasks, and the use cases genuinely diverge rather than overlap.

What MCP gives you, what it costs

MCP streams the full accessibility tree and tool schema into context, which enables rich iterative reasoning at the cost of significant token consumption.

Model Context Protocol exposes Playwright as a structured tool surface. When an agent operates via MCP, it receives a schema describing every available Playwright action, and for every page observation it receives the full accessibility tree — element labels, roles, states, and relationships — as structured data. The agent can request specific elements, observe the result of an action, and reason about what to do next with access to the full page structure. This is the right architecture for a task that requires iterative exploration: debugging a failed test, navigating an unfamiliar workflow, investigating a reported visual regression.

The persistent browser context is MCP's defining advantage over one-shot approaches. The agent operates a real browser session that persists across multiple tool calls — it can navigate to a page, scroll, fill a form, submit, and observe the result, all in one continuous session. The state is real, not simulated. For tasks that require multi-step workflows with real side effects — an agent completing a checkout, an agent submitting a form and verifying the confirmation email — MCP is the only pattern that works cleanly.

The cost is context consumption. A detailed accessibility tree for a modern single-page application can run 20–40K tokens per page observation. A CI-level debugging session — navigate to the test page, observe state, try an action, observe result, repeat — can reach 114K tokens before the agent arrives at a conclusion. At Claude 3.5 Sonnet pricing, that is roughly $0.34 per session at May 2026 rates. For occasional debugging and exploratory testing, this is acceptable. For high-throughput CI across thousands of daily test runs, it is not.

gloss:model context protocol tool:Playwright MCP

What CLI+SKILLs gives you

Purpose-built commands, compact YAML snapshots, and no large schemas in context — the token-efficient path for coding agents that use browser automation as one tool among many.

CLI+SKILLs replaces the MCP server with a command-line interface and a library of SKILLs — purpose-built commands that encapsulate common browser actions. Instead of "here is the full schema of every action Playwright supports, please choose one", the agent calls pre-composed commands like `pw-snapshot`, `pw-click`, `pw-fill`, `pw-navigate`. The schema that lands in context is orders of magnitude smaller. Snapshots are saved to disk as compact YAML rather than streamed into the context window.

This is the architecture Microsoft recommends for coding agents in 2026 — agents that balance browser automation with codebase navigation, reasoning, and code generation. The distinction is that a coding agent has a finite context window to fill with many kinds of information: file contents, terminal output, test results, browser state. MCP consumes a disproportionate share of that window. CLI+SKILLs keeps browser state compact so the agent can hold more of the codebase in context simultaneously.

The trade-off is expressiveness. CLI+SKILLs covers the most common browser interactions with pre-built commands, but it does not give the agent the full Playwright API surface. Tasks that require unusual interactions — drag-and-drop, file upload, complex multi-step form flows — either need a pre-built SKILL or fall back to less elegant workarounds. For standard web navigation and form interaction, the SKILLs cover the vast majority of what agents actually need.

tool:Playwright MCP

How to choose

The split is driven by agent architecture: iterative exploration with persistent context → MCP; high-throughput coding work with browser as one tool among many → CLI+SKILLs.

Use MCP when your agent is reasoning iteratively over page structure and needs persistent browser context across multiple steps. The canonical cases: an agent debugging a UI test failure and needing to observe page state at multiple points in the failure; an agent performing exploratory testing across an unfamiliar workflow; an agent doing accessibility audits where the full element structure matters for every decision. These are workloads where rich introspection pays back its token cost.

Use CLI+SKILLs when your agent is doing high-throughput coding work and needs browser automation as one capability among several. The canonical cases: a coding agent that runs tests, checks the result, reads the failure, edits code, and reruns; an agent generating test cases that occasionally needs to observe a live UI to understand component structure; CI automation where the same tasks run thousands of times and per-run token cost is a real constraint. These workloads do not need the full MCP schema on every invocation.

Most sophisticated production setups end up using both. The same Claude Code session might use CLI+SKILLs for high-throughput test generation and MCP for debugging a specific failure that requires iterative exploration. The two patterns are not mutually exclusive — they are optimised for different phases of the same workflow. Start with CLI+SKILLs as the default for new agent work and add MCP where you hit the limits of pre-built commands.

gloss:model context protocol gloss:agent observability

MCP vs CLI+SKILLs: when each pattern wins

What MCP gives you, what it costs

What CLI+SKILLs gives you

How to choose

Related glossary terms