OpenAI Reportedly Launching ‘Operator’ That Can Control Your Computer This Week


OpenAI is reportedly preparing for the launch of Operator sometime this week. Operator is name of its computer-use agent that can complete tasks in a user’s web browser on their behalf. Other companies including Google and Anthropic have been developing similar “agents” in hopes they will be the next major leap towards AI fulfilling its promise of being able to perform tasks currently done by humans.

According to The Information, which first reported on the impending launch, Operator will provide users with suggested prompts in categories like travel and dining and events. Users could, for instance, ask Operator to find a good flight from New York to Maui that would not have them landing too late in the evening. Operator will not complete a transaction—the user will remain in the loop and complete the checkout process.

It is easy to imagine certain ways Operator could be useful. Aging individuals who are not computer savvy could potentially ask Operator to help them send an email, and see it navigate to Gmail and open a compose window for them. Tech savvy people do not need this type of help, but older generators often struggle navigating the web and completing even simple tasks is a challenge. Bots could help in other areas as well, such as in quality-assurance testing where companies need to test that their new websites or services work properly.

So-called “computer use agents” do come with potential risks. We have already seen a startup introduce a web-navigating bot to automate the process of posting marketing spam to Reddit. Bots that take control of the end-user client are able to bypass API limitations meant to block automation. AI startups will need to take some measures to combat abuse, or else websites will become even more flooded with spam than they are today.

These agents like Operator essentially work by taking screenshots of a user’s browser and sending the images back to OpenAI for analysis. Once its models determine the next step necessary to complete a task, a command is sent back to the browser to move and click the mouse on the appropriate target, or type into an input box. It takes advantage of multi-modal technology OpenAI and others have been developing that can interpret multiple forms of input, in this case text and imagery.

The entire promise of a recent crop of AI startups is that they will be able to create an artificial general intelligence (AGI) that can replace humans on most tasks they perform today and make everyone’s lives more efficient. As exponential gains in the performance of language models have slowed, these companies have been looking for new unlocks that will get them there, and computer use agents are one. An artificial intelligence can not truly replace humans until it can physically complete the tasks for them—writing is just part of a task. Bots also need to be able to navigate spreadsheets, watch videos, and more.

After Anthropic released an initial preview of its computer use bot, early testers complained it was half-baked at best, getting stuck in loops when it does not know what to do or forgetting the task and starting to do something else entirely, like looking at pictures of nature on Google Images. It is also slow, and expensive to operate.

Keeping humans in the loop will be essential with a bot that is granted such high-level control and access to critical data. It seems like perhaps computer-use agents will be akin to self-driving cars. Google was able to make a car drive down a straightaway on its own easy enough, but the edge-case scenarios have taken years to solve.

There is debate on how to measure AGI and when it will be “achieved,” but OpenAI has told its biggest backer Microsoft that it believes AGI will be reached once it has created an AI that can generate at least $100 billion in profit. That is a lofty goal considering OpenAI predicts it will generate $12 billion in revenue in 2025 while still losing billions.

At the same time, neither Microsoft nor Google has seen enterprise customers willing to adopt AI tools as fast as they hoped. Instead of charging $20-30 per employee to add AI tools into their bundles, both companies are now shoving AI into their standard bundles and hiking the prices by a couple of dollars respectively.


Leave a Comment