Yes artificial intelligence is impressive and its future holds great potential, but are we anywhere close to developing Superintelligent AI that can surpass human intelligence? Many appear doubtful considering how AI today still needs significant prompting and is rife with errors but there’s one company that believes we might be closer to the ultimate form of AI than we think—OpenAI’s superintelligent AI investment has been both overt about their concerns but also relatively secretive about the Superalignment team that was formed in July. Despite the disruption that OpenAI recently faced with Sam Altman being ousted from the company and then asked to rejoin it within a span of a few days, things seemed to have settled down at the company as they continue to move towards OpenAI’s GPT-5.
While working towards creating OpenAI’s superintelligent AI, the company also appears extremely determined to put checks in place to allow us to be better prepared to regulate superintelligence when we finally succeed in creating it. If this sounds like a simple enough concept, it really isn’t. The Superalignment team has spent the last few months trying to devise an answer to their problem and so far, the results are both simplistic and complex.
OpenAI’s Superintelligent AI Initiative: Identifying The Problem
With what we know so far, OpenAI’s ChatGPT-4 appears to be the most advanced AI model we have today. Elon Musk, Sundar Pichai, and other big names with their own AI creations might not agree, but it’s unlikely that any AI competitors will come close to the global adoption that OpenAI’s creation has seen. Still, despite its popularity, widespread use, and the rumored arrival of AI GPT-5 next year, we seem quite a bit away from superintelligent AI that will be able to think entirely for itself. What we see now are more akin to models that gather information and relay it rather than reason effectively on their own. As a result, the fear of the AI takeover appears to have died down quite a bit since ChatGPT was first announced.
However, OpenAI’s superintelligent AI team Superalignment, led by Ilya Sutskever and Jan Leike, believe we need to start gearing up and preparing for the inevitable moment when we will have to wrangle control over AI in order to keep it from disregarding human intent. Their question—How do we ensure AI systems much smarter than humans follow human intent? The OpenAI team believes this needs to be answered now before we develop a model that grows beyond the ability of humans to supervise.
It’s a fair question, even if purely theoretical right now, and OpenAI does seem best placed to answer it considering its own contributions in moving closer to this abstract enemy. Ilya Sutskever has long been a precautionary voice in the adoption of AI and rumors had indicated that it was one of his reasons for uncharacteristically pushing for the removal of Sam Altman as CEO. Reports in The Atlantic and various other sources had indicated that the OpenAI board was wary of Altman’s fast-paced product adoption and sale rather than making more calculated efforts to consider the full impact of what they were creating with AI. Despite all this, Sutskever is still at the company, and his work on Superalignment might be the reason why.
A Superintelligent AI Solution
While the problem appears to be an abstract one, OpenAI does have some concrete solutions planned on how they want to tackle this issue. In a paper released on their website on 14 December, OpenAI discussed the need for alternatives to how we align AI priorities today, through reinforcement learning from human feedback (RLHF) that requires human supervision.
The Superalignment team began to consider what would happen if a weaker AI model tried to supervise a Superintelligent AI model. Would it completely stunt the learning and capabilities of the superior model, or could we find a middle ground? Would the training signals from the weak model just cause the superior one to learn all the wrong cues? To test their theory, OpenAI’s superintelligent AI team tried to supervise GPT-4 with a GPT-2 model and found that the model still managed to perform somewhere between the capabilities of GPT-3 and GPT-3.5. Regulation, with some limitations, could be possible.
The main findings of the 49-page paper indicated that a strong pre-trained model could generalize beyond the capabilities of the weak model, with half the performance gap between the two models recovered here. They acknowledge that there was still, however, a limit to how useful this could be, and additional work was necessary for real supervision to be enforced. They also concluded that encouragement to the strong model to make confident predictions with an auxiliary loss could lead to better results. The process and reasoning clearly require some finetuning but the company is headed somewhere significant with their theory.
Where the Research is Headed
The OpenAI team also lists out the empirical procedure for studying superalignment more carefully and has made the open-source code for anyone else who wants to get started with weak-to-strong generalization studies of their own. The company announced a $10 million grant for those who want to continue the work on superhuman AI alignment. Applications are open until 18 February, with a month-long review process before you can hope to hear from the company, but it is a noteworthy initiative all the same. The grant will include $100 thousand to $2 million for academic labs and research units. A one-year $150 thousand OpenAI Superalignment Fellowship is available for graduate students.
If we’re truly headed towards a world where Artificial General Intelligence (AGI) and Superintelligent AI become a reality, this research and others like it will likely be what stands between us and the free reign of AI, and more research is undeniably a good thing. Back in April, The Verge reported that OpenAI GPT-5 was not currently being trained so it could be that their resources are being diverted elsewhere, but we find ourselves hard-pressed to believe that the training and research are truly on pause. Regardless, the OpenAI demo on the alternate avenues of AI research is a hopeful direction the company has taken for now.