Computer Use agents in 2025

I don’t like using computers. I love getting stuff done and creating new experiences, but I would much rather enjoy life and touch grass instead of typing into a slab of quartz.

Twitter guys are promising to solve my problem every day with a new 🚨 BREAKING proclamation. Some of them are promising that AI models are good enough to operate my computer for me just as I do – by clicking, scrolling and being utterly lost. Let’s take it for a spin.

What is CUA – Computer Use Agent

The idea is that yo hook up an AI model to a Browser or an Operating system and it acts like you. It typically has 4 actions:

Click stuff
Type stuff
Screenshot stuff to send to the model

The idea is that those actions are the same as you would take while navigating a computer. With Computer Use Agents, a system does not need to be AI-ready since Agent pretends to be a human.

Test case

I love RVing with my family, but the dirty secret of RVing in high season is that you have to book your campgrounds waaaay in advance, especially when Switzerland or Bavaria have holidays.

This is a gruesome situation because there isn’t anything like Airbnb or Booking.com for campgrounds (at least no reliable options in Europe). You must navigate each campground’s specific website or call them. This quickly becomes unwieldy with the various combinations of dates, campgrounds, and routes and a major bummer for planning vacation. With all these advances in AI using computers surely it’s a solved problem, right?

After all, booking Campsites is the first demo you see on OpenAI computer use (AKA Operator) website!

Well, meet the foe of computer use agents: the notorious calendar picker™️. Only one of the tested agents successfully defeated this peril (And yes, it was OpenAI. Maybe they trained specifically for this.)

The incomplete list of computer use agents

Title	Notes	Self hosted?	Result
OpenAI Operator	Part of ChatGPT Pro offering,	Cloud, $200/mo	In development
Browser Use	Didn’t build	✅ Docker + Python
Browser Use Cloud		Cloud $30/mo	Foiled by forms
Browser MCP	MCP Server connecting to Chrome plugin	✅ Chrome Plugin	Foiled by Cookie Banner
Browserable	Didn’t build, looks Promising	✅ Docker
Bytebot	Uses Claude	✅ Docker
Scrapybara	Pay per use, uses OpenAI CUA	Cloud, Pay Per use + plans	Foiled by forms
Open Operator	OpenAI CUA version of Browser Base	Cloud, Free? +Browserbase plans	Foiled by forms
OpenAI CUA Sample App	OpenAI Demo App using OpenAI CUA	✅ Docker or Python	✅ Great success

OpenAI Operator

Operator is part of ChatGPT Pro offering (the $200 per month plan). OpenAI is promising to make Operator fill forms, book travel and appointments, order groceries, and so on:

We’re making CUA available through a research preview of Operator, an agent that can go to the web to perform tasks for you. Operator is available to Pro users in the U.S. at operator.chatgpt.com⁠(opens in a new window).

Unfortunately, I am not a subscriber to the $200/plan, but fortunately for me, OpenAI recently released Responses API, which exposes Computer Use as a tool you can integrate with your own Chrome instance.

Browser Use

Browser Use is one of Y Combinator’s (W25) computer use agent startups that also has an open-source offering. You can try and install WebUI and the container for the whole setup from this repository.

I really wanted to run the self-hosted version, but I failed at building the project multiple times. It took me way too long to figure out that the reason was me running out of disk space, and I blamed the project for it. After figuring this out, I just paid for their cloud version hoping that’s the SOTA.

Browser Use Cloud

Browser user interface for triggering the agents was really neat and it was all really simple. You get a chat on the left with steps and the preview of the agent working on the right.

Everything was great, except the agent was totally lost on these camping websites.

It used the wrong booking link (inquiry vs booking is a known problem on these sites)
It absolutely could not figure out how to operate the calendar form on 3 seperate sites, never switching from April to August
It put the age of my child as 35041 in one instance.

I think it uses 4o instead of a smarter model for navigating, and that’s the reason. I’m excited to see where the project proceeds in the future, but for now it’s not the answer to my problems.

Browser MCP

Browser MCP is a local MCP server combined with a Chrome plugin that uses the MCP protocol to control your browser. It seems geared toward automated testing of your changes while developing in tools like Cursor. I have high hopes to use that tool in other projects.

Setting up BrowserMCP

Install BrowserMCP Chrome extension

First, you need to install the BrowserMCP Chrome extension. It would probably be prudent to install it in Chromium and not your main browser. But I am not a prudent person.

One thing worth noting is that I can never predict which tab BrowserMCP decides to use. I keep trying to open it in incognito window but it sometimes decides to use my main session where I’m logged into critical stuff. What can possibly go wrong?

Set up MCP Server in Cursor

Put this in .cursor/mcp.json

{
	"mcpServers": {
	  "browsermcp": {
		"command": "npx",
		"args": ["@browsermcp/mcp@0.1.3"]
	  }
	}
}

I thought it was best to restart the cursor after adding MCP servers. For some reason, they fail to connect sometimes, or a cursor upgrade is needed or something, but they keep being a little bit unreliable for me. Once it’s successfully connected, it can list the tools:

But sometimes it will fail to connect for no good reason:

Using Browser MCP

Now you can ask Cursor to do stuff for you. It will open websites and successfully navigate them, but the project seems to be geared towards testing apps it is familiar with. For example, cookie banner seems to be a major foe requiring valiant efforts:

Claude Sonnet 3.7-thinking seems best at this sort of task, but it also had major problems. I’m not sure Cursor+MCP is a good fit for external websites. I suspect a good computer user agent needs to use vision and I don’t think Cursor uses screenshots to communicate with this browser MCP plugin. I suspect it’s using the raw HTML output and thus it cannot successfully navigate obstacles.

I have high hopes of using this combo in my day-to-day coding though.

Browserable

Browserable looks very promising, but it also failed to build on my computer. I now suspect it’s the same disk space issue as before, but I didn’t have the energy to try again.

Bytebot

Bytebot (YC S21) also has a Open Source repository of containerized Chrome instance with agent. It seems to be using Anthropic and I really wanted to test OpenAI CUA, so I didn’t check this one.

Scrapybara

Scrapybara (another YC Combinator bet on browser agents – F24) looks sweet because you only pay per use. You don’t have to buy a subscription to start launching your agents. On top of that – they seem to have worked with OpenAI to integrate their Computer Use Agent API, which was the main reason for me to test these various computer use agents.

You have limited credits to use for free, which made trying this out a no-brainer. Also, I love the aesthetic.

Unfortunately, despite supposedly smarter model, it gets stuck in loops trying to fill out simple forms:

They have a really neat API though and a promising price structure, so I will try them again.

There doesn’t seem to be any open-source version that you can run on your own computer.

Open Operator

Open Operator is a proof of concept by Browser Base – whole browser on the web designed to be ran by agents. They have created a whole SDK for agents to interact with websites.

BrowserBase appears to utilize a combination of models, whereas Open Operator is specifically designed to operate with the OpenAI Computer Use Agent API.

It made good progress in my tests, but got defeated by the calendar selection form:

OpenAI CUA Sample App

This is an unknown demo example app of OpenAI CUA API that you can run on your own computer, and so far it performed the best.

It successfully navigated all obstacles and reported the result: No availability for those dates.

Running OpenAI CUA Sample App

Clone the repository

Edit your OpenAI key in.env

Set up the project

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Run the REPL environment to connect to your local browser to run on your computer:

python cli.py --computer local-playwright

There are also examples on how to connect to Browserbase and Scrapybara -this is how I learned about those projects.

Computer Use agents in 2025

Test case

The incomplete list of computer use agents

OpenAI Operator

Browser Use

Browser Use Cloud

Browser MCP

Setting up BrowserMCP

Install BrowserMCP Chrome extension

Set up MCP Server in Cursor

Using Browser MCP

Browserable

Bytebot

Scrapybara

Open Operator

OpenAI CUA Sample App

Running OpenAI CUA Sample App

Related

Leave a ReplyCancel reply

Test case

The incomplete list of computer use agents

OpenAI Operator

Browser Use

Browser Use Cloud

Browser MCP

Setting up BrowserMCP

Install BrowserMCP Chrome extension

Set up MCP Server in Cursor

Using Browser MCP

Browserable

Bytebot

Scrapybara

Open Operator

OpenAI CUA Sample App

Running OpenAI CUA Sample App

Related

Leave a ReplyCancel reply

Discover more from Artur Piszek

Discover more from Artur Piszek