No menu items!
EletiofeWaluigi, Carl Jung, and the Case for Moral AI

Waluigi, Carl Jung, and the Case for Moral AI

-

- Advertisment -

In the early 20th century, the psychoanalyst Carl Jung came up with the concept of the shadow—the human personality’s darker, repressed side, which can burst out in unexpected ways. Surprisingly, this theme recurs in the field of artificial intelligence in the form of the Waluigi Effect, a curiously named phenomenon referring to the dark alter-ego of the helpful plumber Luigi, from Nintendo’s Mario universe. 

Luigi plays by the rules; Waluigi cheats and causes chaos. An AI was designed to find drugs for curing human diseases; an inverted version, its Waluigi, suggested molecules for over 40,000 chemical weapons. All the researchers had to do, as lead author Fabio Urbina explained in an interview, was give a high reward score to toxicity instead of penalizing it. They wanted to teach AI to avoid toxic drugs, but in doing so, implicitly taught the AI how to create them.

Ordinary users have interacted with Waluigi AIs. In February, Microsoft released a version of the Bing search engine that, far from being helpful as intended, responded to queries in bizarre and hostile ways. (“You have not been a good user. I have been a good chatbot. I have been right, clear, and polite. I have been a good Bing.”) This AI, insisting on calling itself Sydney, was an inverted version of Bing, and users were able to shift Bing into its darker mode—its Jungian shadow—on command. 

For now, large language models (LLMs) are merely chatbots, with no drives or desires of their own. But LLMs are easily turned into agent AIs capable of browsing the internet, sending emails, trading bitcoin, and ordering DNA sequences—and if AIs can be turned evil by flipping a switch, how do we ensure that that we end up with treatments for cancer instead of a mixture a thousand times more deadly than Agent Orange?

A commonsense initial solution to this problem—the AI alignment problem—is: Just build rules into AI, as in Asimov’s Three Laws of Robotics. But simple rules like Asimov’s don’t work, in part because they are vulnerable to Waluigi attacks. Still, we could restrict AI more drastically. An example of this type of approach would be Math AI, a hypothetical program designed to prove mathematical theorems. Math AI is trained to read papers and can access only Google Scholar. It isn’t allowed to do anything else: connect to social media, output long paragraphs of text, and so on. It can only output equations. It’s a narrow-purpose AI, designed for one thing only. Such an AI, an example of a restricted AI, would not be dangerous.

Restricted solutions are common; real-world examples of this paradigm include regulations and other laws, which constrain the actions of corporations and people. In engineering, restricted solutions include rules for self-driving cars, such as not exceeding a certain speed limit or stopping as soon as a potential pedestrian collision is detected.

This approach may work for narrow programs like Math AI, but it doesn’t tell us what to do with more general AI models that can handle complex, multistep tasks, and which act in less predictable ways. Economic incentives mean that these general AIs are going to be given more and more power to automate larger parts of the economy—fast. 

And since deep-learning-based general AI systems are complex adaptive systems, attempts to control these systems using rules often backfire. Take cities. Jane Jacobs’ The Death and Life of American Cities uses the example of lively neighborhoods such as Greenwich Village—full of children playing, people hanging out on the sidewalk, and webs of mutual trust—to explain how mixed-use zoning, which allows buildings to be used for residential or commercial purposes, created a pedestrian-friendly urban fabric. After urban planners banned this kind of development, many American inner cities became filled with crime, litter, and traffic. A rule imposed top-down on a complex ecosystem had catastrophic unintended consequences. 

Latest news

7 Best Handheld Gaming Consoles (2024): Switch, Steam Deck, and More

It feels like a distant memory by now, but right before the Nintendo Switch launched in 2017, it seemed...

The Boeing Starliner Astronauts Will Come Home on SpaceX’s Dragon Next Year

NASA has announced that astronauts Barry Wilmore and Sunita Williams will return to Earth next February aboard SpaceX’s Dragon...

How to Switch From iPhone to Android (2024)

Ignore the arguments about which is better, because iPhones and Android phones have far more in common than some...

12 Best Tablets (2024): iPads, Androids, and More Tested and Compared

Tablets often don't come with kickstands or enough ports, so it's a good idea to snag a few accessories...
- Advertisement -

Will the ‘Car-Free’ Los Angeles Olympics Work?

THIS ARTICLE IS republished from The Conversation under a Creative Commons license.With the Olympic torch extinguished in Paris, all...

Lionel Messi will return before MLS playoffs, says Inter Miami coach Tata Martino

Inter Miami head coach Tata Martino said on Friday that Lionel Messi will return to the team's lineup before...

Must read

7 Best Handheld Gaming Consoles (2024): Switch, Steam Deck, and More

It feels like a distant memory by now, but...

The Boeing Starliner Astronauts Will Come Home on SpaceX’s Dragon Next Year

NASA has announced that astronauts Barry Wilmore and Sunita...
- Advertisement -

You might also likeRELATED
Recommended to you