Alfred Intelligence 3A: OpenAI Operator design details

Despite its deceivingly simple chat interface, Operator consists of many design considerations and details. Or more likely the latter led to the former.

Jan 24, 2025

This is a special off-schedule issue of Alfred Intelligence (hence 3A) because I was spending too much time scrolling Twitter this morning and writing felt more productive.

If you have been following news about AI, you couldn’t have missed OpenAI’s launch of Operator and the endless commentary. While most people talk about what it can or cannot do, I want to discuss its design, especially since I have been thinking about and building an AI assistant. The Operator team has made some great design decisions about the interface, human-assistant handoff, and onboarding, and some that I believe can be improved.

Here are my notes:

What I love about Operator’s design

1. Polished consumer product experience

Right off the bat, Operator came across as a polished consumer product, especially compared to Anthropic’s Claude computer use. Even the names (“Operator” vs literally “computer use”) reinforced the respective impression. ChatGPT Pro customers in the US can already use it by simply going to https://operator.chatgpt.com/ while I have to use Docker to try Claude computer use.1

There are three “tiny” design details of Operator that I love:

One, it has a chat interface that is almost identical to ChatGPT, which people are already familiar with. When it needs to use the computer, it shows a small computer screen in the chat history.

If you want, you can click on the small computer screen to get a bigger computer screen in a popup panel. Or you can expand the computer screen to view the chat and computer screen side by side. On the other hand, most other apps show the chat and computer screen side by side by default (see below for Claude computer use’s interface). I like Operator’s design for two reasons:

Having the small computer screen in the chat history maintains the structure of the conversation—At this point in the chat, I need to use the computer. Now, I’m using the computer.—rather than having the computer screen always present.
Consequently, when Operator doesn’t need to use the computer, it doesn’t have to show the computer screen. With the side-by-side design, the chat becomes secondary to the computer screen, even though the conversation with the AI assistant should be the primary focus. When we are comfortable with AI assistants doing things for us in the background, seeing the computer screen will be unnecessary (though it might take some time to get there).

Two, Operator’s interface is clean. One thing I dislike about Claude computer use is how Claude’s thoughts, actions, and screenshots will clutter the chat history, making it almost impossible to browse previous messages.

Operator shows the steps inside a toggled panel, which can be hidden away. The steps are also in plain English (vs “Tool Use: computer, Input: {‘action’: ‘key’, ‘text’: ‘Page_Down"‘}. Clicking on a past step shows the screenshot taken at that step so that you can go back in time and see what was done then. This is especially useful when the assistant fails and you want to see where it went wrong. The design is likely intentional for building trust. But this also feels like a debugging interface for developers. If there are 50 to 100 steps, this list of steps might not be user-friendly anymore.

Notice the “Wait” tooltip on the computer screen. It shows what the assistant was doing at this step, such as waiting, clicking, or typing “eggs”

Three, Operator’s computer looks modern. The team chose to use a remote Chrome instance, which looks just like what most people use on their laptops nowadays. Anthropic uses a remote computer that runs on Ubuntu 22.04 and uses apps such as LibreOffice Calc. Even though Ubuntu 22.04 was released just two years ago, it looks like a 1990s desktop.

2. Expectable human-assistant handoffs

The Operator team believes while we want AI assistants to do things for us, humans should still be in the loop to prevent irreversible mistakes. The CUA model is not only trained to reason the steps to complete a task but also “trained to ask for user confirmation before finalizing tasks with external side effects, for example before submitting an order, sending an email, etc., so that the user can double-check the model’s work before it becomes permanent.”

I suspect getting the assistant to check before acting is also possible via prompt engineering. But I also suspect the Operator team tried both and found better adherence through model training (or both) than prompt engineering alone.

I’m also impressed that Operator can detect situations that might require human inputs and check with the user, such as the example in the screenshot below.

I have been thinking about this design consideration: It seems risky for now to let the assistant do everything autonomously but it also defeats the point if the assistant has to check every step of the way. How do we get it to check only in necessary situations? Or even better, how do we get the AI assistant to know enough about us so that it doesn’t have to ask?

3. Other tiny design details

Ideas. The problem with a chat interface is the lack of affordance. People don’t know what to do with it. It is like being given a lamp and told a genie will grant you any wish and you can do anything… except you can’t do this, you can’t do that yet. The Operator team tackles this by suggesting pre-filled prompts so that you can have a sense of what it can do.

Custom instructions and saved prompts. One of my complaints about ChatGPT is it doesn’t understand me well enough, so I always have to include my preferences in my prompt. ChatGPT has memory, which automatically picks up details and preferences, but I have found it to be not accurate enough. Similarly, with Operator, you can include custom instructions for all tasks or specific websites. While I’m not sure how well it works, from a product design point-of-view, having it is likely better than not since there is little downside. Operator also allows you to save prompts, which resolves the complaint I have about ChatGPT.

Parallel tasks. You can simply start another chat and leave tasks running in the background. Since each task can take Operator (or other existing assistants) some time to complete, being able to do multiple tasks in parallel is necessary. My only grudge is that the chat interface encourages real-time conversations; the embedded computer screen will make people want to watch it. But I’m still not sure what a better paradigm could be.

Taking control. At any point, the user can take over the computer and use it. The team said Operator cannot see what the user is doing, which probably just means it isn’t taking screenshots when the user has taken control. The purpose is to keep your session private, such as when you are entering sensitive information. Then you have to tell Operator what you have done, though Yash from OpenAI said Operator can also take another screenshot and figure out what has changed. This feels like an interesting design decision where building trust with the user was prioritized over making things convenient.

What I think can be improved about Operator’s design

1. Visual is inefficient

As I mentioned previously, it will be a lot faster if the AI can “read” a webpage by extracting the HTML than looking at screenshots of the page like a human would. For example, Claude computer use slowly scrolled and took screenshots of my 3,000-word annual review, instead of scraping the source code. Andrej Karpathy had a similar experience with Operator. But there are also situations where visual is necessary, such as identifying objects in images. The challenge is probably training the model to be able to decide when to act like a human and when to be a computer. This will significantly reduce the time required to complete tasks.

I stopped Claude computer use from scrolling through my entire annual review and taking screenshots.

2. Actual speed vs perceived speed

Operator is slow. To make AI assistants faster (and ignore performance for a second), we might “just” need small models and more GPUs, which OpenAI will no doubt achieve eventually. But the actual speed might not matter as much as the perceived speed. Human executive assistants don’t complete tasks in seconds. They work on multiple tasks in the background and ask their bosses for inputs in a batch. The challenge is making the user trust the AI assistant enough to let it do things in the background, without feeling the need to constantly monitor it and approve steps. As I mentioned above, for this reason, I’m not sure the chat interface is the right paradigm. Perhaps there should be no interface so that we will (be forced to) delegate completely, as Linus suggested. Though, this will take some time; the Hacker News comment below is a common sentiment.

I am a little concerned with letting an AI agent that routinely hallucinates control my browser. I can't not watch it do the task, in case it messes up. So I am not sure what the value is versus me doing it myself.

3. Other tiny design details

Memory. I’m surprised Operator doesn’t seem to have memory, even though ChatGPT has. Assuming memory works as well as intended, it should collect enough information about the user’s preference over time to act on the user’s behalf, instead of constantly checking in.

Scheduling. OpenAI just released ChatGPT Tasks for scheduling recurring tasks, and it seems perfectly suited for Operator. Go through my emails at 7 am every morning, buy bread and eggs on Instacart every Sunday night, pay my bills at the end of every month.

Local computer. Since Operator uses a remote browser, it cannot interact with the files and apps on my computer, which might account for a long tail of use cases. But to be fair, almost every file can be hosted online now and I store most of my files on the cloud anyway. And increasingly, many desktop apps have a web version (including Photoshop). Yash from OpenAI did mention in the demo that the Computer-Using Agent (CUA) model could technically be used to use a computer but it doesn’t seem available in Operator yet.

Overall, I think OpenAI has built a beautiful and well-designed product, especially since it’s the first “consumer-grade” AI assistant that can complete complex tasks for us. I certainly hope it won’t be the only one, though!

This could be intentional as OpenAI focuses on consumers while Anthropic on enterprises (or developers in enterprises).

Letters To Alfred

Discussion about this post