EletiofeWaluigi, Carl Jung, and the Case for Moral AI

Waluigi, Carl Jung, and the Case for Moral AI

-

- Advertisment -

In the early 20th century, the psychoanalyst Carl Jung came up with the concept of the shadow—the human personality’s darker, repressed side, which can burst out in unexpected ways. Surprisingly, this theme recurs in the field of artificial intelligence in the form of the Waluigi Effect, a curiously named phenomenon referring to the dark alter-ego of the helpful plumber Luigi, from Nintendo’s Mario universe. 

Luigi plays by the rules; Waluigi cheats and causes chaos. An AI was designed to find drugs for curing human diseases; an inverted version, its Waluigi, suggested molecules for over 40,000 chemical weapons. All the researchers had to do, as lead author Fabio Urbina explained in an interview, was give a high reward score to toxicity instead of penalizing it. They wanted to teach AI to avoid toxic drugs, but in doing so, implicitly taught the AI how to create them.

Ordinary users have interacted with Waluigi AIs. In February, Microsoft released a version of the Bing search engine that, far from being helpful as intended, responded to queries in bizarre and hostile ways. (“You have not been a good user. I have been a good chatbot. I have been right, clear, and polite. I have been a good Bing.”) This AI, insisting on calling itself Sydney, was an inverted version of Bing, and users were able to shift Bing into its darker mode—its Jungian shadow—on command. 

For now, large language models (LLMs) are merely chatbots, with no drives or desires of their own. But LLMs are easily turned into agent AIs capable of browsing the internet, sending emails, trading bitcoin, and ordering DNA sequences—and if AIs can be turned evil by flipping a switch, how do we ensure that that we end up with treatments for cancer instead of a mixture a thousand times more deadly than Agent Orange?

A commonsense initial solution to this problem—the AI alignment problem—is: Just build rules into AI, as in Asimov’s Three Laws of Robotics. But simple rules like Asimov’s don’t work, in part because they are vulnerable to Waluigi attacks. Still, we could restrict AI more drastically. An example of this type of approach would be Math AI, a hypothetical program designed to prove mathematical theorems. Math AI is trained to read papers and can access only Google Scholar. It isn’t allowed to do anything else: connect to social media, output long paragraphs of text, and so on. It can only output equations. It’s a narrow-purpose AI, designed for one thing only. Such an AI, an example of a restricted AI, would not be dangerous.

Restricted solutions are common; real-world examples of this paradigm include regulations and other laws, which constrain the actions of corporations and people. In engineering, restricted solutions include rules for self-driving cars, such as not exceeding a certain speed limit or stopping as soon as a potential pedestrian collision is detected.

This approach may work for narrow programs like Math AI, but it doesn’t tell us what to do with more general AI models that can handle complex, multistep tasks, and which act in less predictable ways. Economic incentives mean that these general AIs are going to be given more and more power to automate larger parts of the economy—fast. 

And since deep-learning-based general AI systems are complex adaptive systems, attempts to control these systems using rules often backfire. Take cities. Jane Jacobs’ The Death and Life of American Cities uses the example of lively neighborhoods such as Greenwich Village—full of children playing, people hanging out on the sidewalk, and webs of mutual trust—to explain how mixed-use zoning, which allows buildings to be used for residential or commercial purposes, created a pedestrian-friendly urban fabric. After urban planners banned this kind of development, many American inner cities became filled with crime, litter, and traffic. A rule imposed top-down on a complex ecosystem had catastrophic unintended consequences. 

Latest news

What The Heck Is This New Meta AI Photo Feature And Can I Turn It Off?

Have you ever wanted to animate your profile picture on Facebook? Or turn your latest Instagram upload into a...

Inside the Race to Develop a Test for the Rare Andes Hantavirus

As passengers return to the US from the cruise that saw a rare hantavirus outbreak, much of the country...

OnlyFans’ First-Gen Creators Are Retiring—and Some Are Begging You to Forget They Exist

On April 28, just before noon, Win White logged onto X and posted a series of messages to his...

Sony Bravia Theater Bar 5 Review: Basic Bar, Big Sound

Review: Sony Bravia Theater Bar 5The latest Bravia Theater soundbar strips away the nice-to-have extras, but its crisp and...
- Advertisement -

A Conspiracy Theory About QR Codes Has Led to Chaos Ahead of Georgia’s Midterms

QR codes are at the center of the latest conspiracy theory in Georgia’s elections. And it’s largely thanks to...

Meet the Sad Wives of AI

If i had to listen to another minute of my husband talking about Claude Code, I might have actually...

Must read

What The Heck Is This New Meta AI Photo Feature And Can I Turn It Off?

Have you ever wanted to animate your profile picture...

Inside the Race to Develop a Test for the Rare Andes Hantavirus

As passengers return to the US from the cruise...
- Advertisement -

You might also likeRELATED
Recommended to you