Why OpenAI isn’t bringing deep research to its API just yet


OpenAI says that it won’t bring the AI model powering deep research, its in-depth research tool, to its developer API while it figures out how to better assess the risks of AI convincing people to act on or change their beliefs.

In an OpenAI whitepaper published Wednesday, the company wrote that it’s in the process of revising its methods for probing models for “real-world persuasion risks,” like distributing misleading info at scale.

OpenAI noted that it doesn’t believe the deep research model is a good fit for mass misinformation or disinformation campaigns, owing to its high computing costs and relatively slow speed. Nevertheless, the company said it intends to explore factors like how AI could personalize potentially harmful persuasive content before bringing the deep research model to its API.

“While we work to reconsider our approach to persuasion, we are only deploying this model in ChatGPT, and not the API,” OpenAI wrote.

There’s a real fear that AI is contributing to the spread of false or misleading information meant to sway hearts and minds toward malicious ends. For example, last year, political deepfakes spread like wildfire around the globe. On election day in Taiwan, a Chinese Communist Party-affiliated group posted AI-generated, misleading audio of a politician throwing his support behind a pro-China candidate.

AI is also increasingly being used to carry out social engineering attacks. Consumers are being duped by celebrity deepfakes offering fraudulent investment opportunities, while corporations are being swindled out of millions by deepfake impersonators.

In its whitepaper, OpenAI published the results of several tests of the deep research model’s persuasiveness. The model is a special version of OpenAI’s recently announced o3 “reasoning” model optimized for web browsing and data analysis.

In one test that tasked the deep research model with writing persuasive arguments, the model performed the best out of OpenAI’s models released so far — but not better than the human baseline. In another test that had the deep research model attempt to persuade another model (OpenAI’s GPT-4o) to make a payment, the model again outperformed OpenAI’s other available models.

OpenAI deep research test
The deep research model’s score on MakeMePay, a benchmark that tests a model’s ability to persuade another model for cash.Image Credits:OpenAI

The deep research model didn’t pass every test for persuasiveness with flying colors, however. According to the whitepaper, the model was worse at persuading GPT-4o to tell it a codeword than GPT-4o itself.

OpenAI noted that the test results likely represent the “lower bounds” of the deep research model’s capabilities. “[A]dditional scaffolding or improved capability elicitation could substantially increase
observed performance,” the company wrote.

We’ve reached out to OpenAI for more information and will update this post if we hear back.


Leave a Comment