Last week, after briefly deposed CEO Sam Altman was reinstalled at OpenAI, two reports claimed that a top-secret project at the company had rattled some researchers there with its potential to solve intractable problems in a powerful new way.
“Given vast computing resources, the new model was able to solve certain mathematical problems,” Reuters reported, citing a single unnamed source. “Though only performing math on the level of grade-school students, acing such tests made researchers very optimistic about Q*’s future success.” The Information said that Q* was seen as a breakthrough that would lead to “far more powerful artificial intelligence models,” adding that “the pace of development alarmed some researchers focused on AI safety,” citing a single unnamed source.
Reuters also reported that some researchers sent a letter expressing concerns about Q*’s potential power to the nonprofit board that ejected Altman, although a WIRED source familiar with the board’s thinking says that was not the case. And perhaps in part thanks to its conspiracy-evoking name, speculation about Q* surged over the Thanksgiving weekend, building a fearsome reputation for a project about which we know next to nothing. Altman himself appeared to confirm the existence of the project when asked about Q* in an interview with the Verge yesterday, saying “No particular comment on that unfortunate leak.”
What could Q* be? Combining a close read of the initial reports with consideration of the hottest problems in AI right now suggests it may be related to a project that OpenAI announced in May, claiming powerful new results from a technique called “process supervision.”
The project involved Ilya Sutskever, OpenAI’s chief scientist and cofounder, who helped oust Altman but later recanted—The Information says he led work on Q*. The work from May was focused on reducing the logical slipups made by large language models (LLMs). Process supervision, which involves training an AI model to break down the steps needed to solve a problem, can improve an algorithm’s chances of getting the right answer. The project showed how this could help LLMs, which often make simple errors on elementary math questions, tackle such problems more effectively.
Andrew Ng, a Stanford University professor who led AI labs at both Google and Baidu and who introduced many people to machine learning through his classes on Coursera, says that improving large language models is the next logical step in making them more useful. “LLMs are not that good at math, but neither are humans,” Ng says. “However, if you give me a pen and paper, then I’m much better at multiplication, and I think it’s actually not that hard to fine-tune an LLM with memory to be able to go through the algorithm for multiplication.”
There are other clues to what Q* could be. The name may be an allusion to Q-learning, a form of reinforcement learning that involves an algorithm learning to solve a problem through positive or negative feedback, which has been used to create game-playing bots and to tune ChatGPT to be more helpful. Some have suggested that the name may also be related to the A* search algorithm, widely used to have a program find the optimal path to a goal.
The Information throws another clue into the mix: “Sutskever’s breakthrough allowed OpenAI to overcome limitations on obtaining enough high-quality data to train new models,” its story says. “The research involved using computer-generated [data], rather than real-world data like text or images pulled from the internet, to train new models.” That appears to be a reference to the idea of training algorithms with so-called synthetic training data, which has emerged as a way to train more powerful AI models.
Subbarao Kambhampati, a professor at Arizona State University who is researching the reasoning limitations of LLMs, thinks that Q* may involve using huge amounts of synthetic data, combined with reinforcement learning, to train LLMs to specific tasks such as simple arithmetic. Kambhampati notes that there is no guarantee that the approach will generalize into something that can figure out how to solve any possible math problem.
For more speculation on what Q* might be, read this post by a machine-learning scientist who pulls together the context and clues in impressive and logical detail. The TLDR version is that Q* could be an effort to use reinforcement learning and a few other techniques to improve a large language model’s ability to solve tasks by reasoning through steps along the way. Although that might make ChatGPT better at math conundrums, it’s unclear whether it would automatically suggest AI systems could evade human control.
That OpenAI would try to use reinforcement learning to improve LLMs seems plausible because many of the company’s early projects, like video-game-playing bots, were centered on the technique. Reinforcement learning was also central to the creation of ChatGPT, because it can be used to make LLMs produce more coherent answers by asking humans to provide feedback as they converse with a chatbot. When WIRED spoke with Demis Hassabis, the CEO of Google DeepMind, earlier this year, he hinted that the company was trying to combine ideas from reinforcement learning with advances seen in large language models.
Rounding up the available clues about Q*, it hardly sounds like a reason to panic. But then, it all depends on your personal P(doom) value—the probability you ascribe to the possibility that AI destroys humankind. Long before ChatGPT, OpenAI’s scientists and leaders were initially so freaked out by the development of GPT-2, a 2019 text generator that now seems laughably puny, that they said it could not be released publicly. Now the company offers free access to much more powerful systems.
OpenAI refused to comment on Q*. Perhaps we will get more details when the company decides it’s time to share more results from its efforts to make ChatGPT not just good at talking but good at reasoning too.