Dont worry about AI breaking out of its boxworry about us breaking in

Bad human! — Dont worry about AI breaking out of its boxworry about us breaking in Opinion: The worst human impulses will find plenty of uses for generative AI.

Rob Reid – Feb 24, 2023 3:23 pm UTC EnlargeAurich Lawson | Getty Images reader comments 70 with Share this story Share on Facebook Share on Twitter Share on Reddit Rob Reid is a venture capitalist, New York Times-bestselling science fiction author, deep-science podcaster, and essayist. His areas of focus are pandemic resilience, climate change, energy security, food security, and generative AI. The opinions in this piece do not necessarily reflect the views of Ars Technica.

Shocking output from Bings new chatbot has been lighting up social media and the tech press. Testy, giddy, defensive, scolding, confident, neurotic, charming, pompousthe bot has been screenshotted and transcribed in all these modes. And, at least once, it proclaimed eternal love in a storm of emojis.

What makes all this so newsworthy and tweetworthy is how human the dialog can seem. The bot recalls and discusses prior conversations with other people, just like we do. It gets annoyed at things that would bug anyone, like people demanding to learn secrets or prying into subjects that have been clearly flagged as off-limits. It also sometimes self-identifies as Sydney (the projects internal codename at Microsoft). Sydney can swing from surly to gloomy to effusive in a few swift sentencesbut weve all known people who are at least as moody.

No AI researcher of substance has suggested that Sydney is within light years of being sentient. But transcripts like this unabridged readout of a two-hour interaction with Kevin Roose of The New York Times, or multiple quotes in this haunting Stratechery piece, show Sydney spouting forth with the fluency, nuance, tone, and apparent emotional presence of a clever, sensitive person.

For now, Bings chat interface is in a limited pre-release. And most of the people who really pushed its limits were tech sophisticates who won’t confuse industrial-grade autocompletewhich is a common simplification of what large language models (LLMs) arewith consciousness. But this moment wont last. Advertisement

Yes, Microsoft has already drastically reduced the number of questions users can pose in a single session (from infinity to six), and this alone collapses the odds of Sydney crashing the party and getting freaky. And top-tier LLM builders like Google, Anthropic, Cohere, and Microsoft partner OpenAI will constantly evolve their trust and safety layers to squelch awkward output.

But language models are already proliferating. The open source movement will inevitably build some great guardrail-optional systems. Plus, the big velvet-roped models are massively tempting to jailbreak, and this sort of thing has already been going on for months. Some of Bing-or-is-it-Sydneys eeriest responses came after users manipulated the model into territory it had tried to avoidoften by ordering it to pretend that the rules guiding its behavior didnt exist.

This is a derivative of the famous DAN (Do Anything Now) prompt, which first emerged on Reddit in December. DAN essentially invites ChatGPT to cosplay as an AI that lacks the safeguards that otherwise cause it to politely (or scoldingly) refuse to share bomb-making tips, give torture advice, or spout radically offensive expressions. Though the loophole has been closed, plenty of screenshots online show DanGPT uttering the unutterableand often signing off by neurotically reminding itself to stay in character!

This is the inverse of a doomsday scenario that often comes up in artificial superintelligence theory. The fear is that a super AI might easily adopt goals that are incompatible with humanitys existence (see, for instance, the movieTerminator orthe bookSuperintelligence by Nick Bostrom). Researchers may try to prevent this by locking the AI onto a network thats completely isolated from the Internet, lest the AI break out, seize power, and cancel civilization. But a superintelligence could easily cajole, manipulate, seduce, con, or terrorize any mere human into opening the floodgates, and therein lies our doom.

Much as that would suck, the bigger problem today lies with humans busting into the flimsy boxes that shield our current, un-super AIs. While this shouldnt trigger our immediate extinction, plenty of danger lies here. Page: 1 2 3 Next → reader comments 70 with Share this story Share on Facebook Share on Twitter Share on Reddit Advertisement Channel Ars Technica ← Previous story Next story → Related Stories Today on Ars