absolutely, perfectly safe — OpenAI checked to see whether GPT-4 could take over the world “ARC’s evaluation has much lower probability of leading to an AI takeover than the deployment itself.”
Benj Edwards – Mar 15, 2023 10:09 pm UTC EnlargeArs Technica reader comments 128 with Share this story Share on Facebook Share on Twitter Share on Reddit
As part of pre-release safety testing for its new GPT-4 AI model, launched Tuesday, OpenAI allowed an AI testing group to assess the potential risks of the model’s emergent capabilitiesincluding “power-seeking behavior,” self-replication, and self-improvement.
While the testing group found that GPT-4 was “ineffective at the autonomous replication task,” the nature of the experiments raises eye-opening questions about the safety of future AI systems. Raising alarms
“Novel capabilities often emerge in more powerful models,” writes OpenAI in a GPT-4 safety document published yesterday. “Some that are particularly concerning are the ability to create and act on long-term plans, to accrue power and resources (power-seeking), and to exhibit behavior that is increasingly ‘agentic.'” In this case, OpenAI clarifies that “agentic” isn’t necessarily meant to humanize the models or declare sentience but simply to denote the ability to accomplish independent goals. Further ReadingReport: Microsoft cut a key AI ethics team
Over the past decade, some AI researchers have raised alarms that sufficiently powerful AI models, if not properly controlled, could pose an existential threat to humanity (often called “x-risk,” for existential risk). In particular, “AI takeover” is a hypothetical future in which artificial intelligence surpasses human intelligence and becomes the dominant force on the planet. In this scenario, AI systems gain the ability to control or manipulate human behavior, resources, and institutions, usually leading to catastrophic consequences. Advertisement
As a result of this potential x-risk, philosophical movements like Effective Altruism (“EA”) seek to find ways to prevent AI takeover from happening. That often involves a separate but often interrelated field called AI alignment research.
In AI, “alignment” refers to the process of ensuring that an AI system’s behaviors align with those of its human creators or operators. Generally, the goal is to prevent AI from doing things that go against human interests. This is an active area of research but also a controversial one, with differing opinions on how best to approach the issue, as well as differences about the meaning and nature of “alignment” itself. GPT-4’s big tests EnlargeArs Technica
While the concern over AI “x-risk” is hardly new, the emergence of powerful large language models (LLMs) such as ChatGPT and Bing Chatthe latter of which appeared very misaligned but launched anywayhas given the AI alignment community a new sense of urgency. They want to mitigate potential AI harms, fearing that much more powerful AI, possibly with superhuman intelligence, may be just around the corner. Page: 1 2 3 Next → reader comments 128 with Share this story Share on Facebook Share on Twitter Share on Reddit Benj Edwards Benj Edwards is an AI and Machine Learning Reporter for Ars Technica. For over 16 years, he has written about technology and tech history for sites such as The Atlantic, Fast Company, PCMag, PCWorld, Macworld, How-To Geek, and Wired. In 2005, he created Vintage Computing and Gaming. He also hosted The Culture of Tech podcast and contributes to Retronauts. Mastodon: benjedwards@mastodon.social Twitter @benjedwards Advertisement Channel Ars Technica ← Previous story Next story → Related Stories Today on Ars