When artificial intelligence image generators first rolled out, they seemed like magic. Churning out detailed imagery in minutes was, from one angle, a technical marvel. From another angle, though, it looked like mere mimicry.
The models were trained on billions of images without anyone asking the humans behind them for permission. “They have sucked the creative juices of millions of artists,” says Eva Toorenent, an illustrator who serves as the Netherlands adviser for the European Guild for Artificial Intelligence Regulation. “It is absolutely horrifying.”
As AI company valuations soared, the people whose work provided the bedrock for their products saw no compensation. Many artists ardently oppose how AI image generators use their work. “To see corporations scrape our style and then attempt to replace us with bastardized versions of our own work is beyond disgusting,” artist and writer Molly Crabapple says.
Over the past year, as image-generating AI tools have grown in popularity, illustrators, photographers, and other visual artists have struggled to determine what they can do to have a say in how their work is used. Some are attempting lawsuits, others are asking regulators to step in. There’s nothing they can do to change how generators have been trained in the past. Starting today, though, the startup Spawning is launching a new tool to help artists who want to block new attempts to train AI on their work. Called Kudurru, it is a network of websites that identifies web scraping as it’s happening. (The name comes from a Mesopotamian term for stones that denoted boundaries and ownership.)
It’s helpful to know how image generators are trained to understand exactly how Kudurru works. Most of these generators find their training data by “scraping” the internet. Scrapers use software that collects data in bulk from across the web, from platforms like DeviantArt and professional libraries like Getty Images to individual artists’ websites. One of the most popular and most commonly used roadmaps to decide what to scrape is the dataset LAION-5B, which lists the URLs to billions of images. When an AI company uses a dataset like LAION-5B to scrape images, it has to download those images from the URL links. That’s where Kudurru finds its opening.
According to Spawning cofounder Jordan Meyer, during internal testing, Kudurru was able to briefly stymie a substantial amount of scraping activity. “For about two hours in July, we stopped everyone who was in the process of downloading the LAION-5B dataset,” Meyer says.
To identify the scrapers, Spawning operates a honeypot-like “defense network” of more than 1,000 websites, each hosting images that groups using LAION-5B would scrape to train a generative AI model. These websites collect data on the IP addresses attempting to scrape images; Spawning can often identify the groups doing the scraping and the regions with the most overall scraping activity. (China is currently in the lead.)
“We’re developing what is basically a blacklist,” Spawning cofounder Patrick Hoepner says. Spawning, also the company behind Have I Been Trained?, a site that lets creators see if AI has scraped their work, updates this blacklist in real time, based on the behavior of the IP addresses it tracks.
Kudurru gives artists two options to disrupt scraping. First, they can simply block the blacklisted IP addresses. Second, to take things a step further, they can also choose to sabotage or “poison” the scrapers’ efforts by sending back a different image than the one requested. Spawning gives users the option to choose what images they send back, although it does have some suggestions. “It could just be a middle finger over and over again,” Meyer says.
This “poisoning” could have a cumulative effect of spoiling how generators interpret prompts; if I made a personal website of my photography and used Kudurru to send back middle fingers, for example, a generator might start associating the prompt “Kate Knibbs photography style” with obscene hand gestures.
Spawning believes that its tool could meaningfully hinder how AI image generators currently train. As more people use Kudurru, it will increase its size and power.
The beta version of Kudurru is limited in scope; it’s a WordPress plug-in for now, although Spawning plans to roll out additional plug-ins, as well as integrations for video and audio. (It hopes to introduce text eventually, but it’s much harder to prevent text scraping.)
While Kudurru offers artists a new way to resist AI training, it’s not the first or only tool available that is designed to stop unwanted web scraping. Earlier this year, a team at the University of Chicago released Glaze, another type of tool that attempts to confuse scrapers. Glaze adds what it calls a “cloak” to an image, essentially an invisible watermark, designed to thwart scraping attempts.
Meanwhile, bot-protection companies like DataDome have been offering services to deter scraping for years and have recently seen a huge shift in response to the rise of generative AI. CEO Benjamin Fabre told WIRED that he has seen a surge in customers looking for protection against AI-related scrapers. “Seventy percent of our customers reach out to us asking to make sure DataDome is blocking ChatGPT” and other large language models, he says.
Although companies like DataDome are well-established, they cater to large corporations and charge accordingly; they’re usually not accessible to individuals. Kudurru’s arrival, then, is promising precisely because it is offering a free tool aimed at regular people.
Still, Kudurru is far from a broad or permanent solution for artists who want to stop AI scraping; even its creators envision it as a stopgap measure as people wait for meaningful regulatory or legislative action to manage how AI is trained. Most artist advocates believe that these companies will not stop scraping for training data voluntarily.
Copyright activist Neil Turkewitz sees it as a “speed bump” for AI generators, not an industrywide fix. “I think they’re great. They should be developed, and people should use them,” Turkewitz says. “And it’s absolutely essential we don’t view these technical measures as the solution.”
“I applaud attempts to develop tools to help artists,” Crabapple says. “But they ultimately put the burden on us, and that’s not where it should be. We shouldn’t have to play whack-a-mole to keep our work from being stolen and regurgitated by multibillion-dollar companies. The only solution to this is a legislative one.”
A larger-scale, permanent change in how generators train will likely need to come from governments; it is highly unlikely that the larger generative AI companies will stop web scraping voluntarily. Some are attempting to ameliorate critics by creating opt-out features, where people who don’t want their work to be used can ask to be removed from future training sets. These measures have been viewed as half-baked at best by many artists, who want to see a world in which training takes place only if they’ve opted into participation.
To make matters worse, companies have started developing their own opt-in protocols one by one rather than settling on a common system, making it time-consuming for artists to withdraw their work from each individual generator. (Spawning previously worked on an early opt-out tool for Have I Been Trained? but sees the fragmentation as “disappointing,” according to Meyer.)
The European Union has come the furthest in developing legal frameworks for artistic consent to AI training. “It’s going incredibly well,” Toorenent says. She is optimistic that the AI Act could be the beginning of the end of the training free-for-all. Of course, the rest of the planet would have to catch up—and the AI Act would help artists enforce choices to opt out, not shift the model to opt-in. In other words, the world is a long, long way off from the dream of an opt-in training structure becoming a reality. In the meantime—well, there’s Kudurru.