EletiofePerplexity Plagiarized Our Story About How Perplexity Is a...

Perplexity Plagiarized Our Story About How Perplexity Is a Bullshit Machine

-

- Advertisment -

Earlier this week, WIRED published a story about the AI-powered search startup Perplexity, which Forbes has accused of plagiarism. In it, my colleague Dhruv Mehrotra and I reported that the company was surreptitiously scraping, using crawlers to visit and download parts of websites from which developers had tried to block it, in violation of its own publicly stated policy of honoring the Robots Exclusion Protocol.

Our findings, as well as those of the developer Robb Knight, identified a specific IP address almost certainly linked to Perplexity and not listed in its public IP range, which we observed scraping test sites in apparent response to prompts given to the company’s public-facing chatbot. According to server logs, that same IP visited properties belonging to Condé Nast, the media company that owns WIRED, at least 822 times in the past three months—likely a significant undercount, because the company retains only a small portion of its records.

We also reported that the chatbot was bullshitting, in the technical sense. In one experiment, it generated text about a girl following a trail of mushrooms when asked to summarize the content of a website that its agent did not, according to server logs, attempt to access.

Perplexity and its CEO, Aravind Srinivas, did not substantively dispute the specifics of WIRED’s reporting. “The questions from WIRED reflect a deep and fundamental misunderstanding of how Perplexity and the Internet work,” Srinivas said in a statement. Backed by Jeff Bezos’ family office and by Nvidia, among others, Perplexity has said it is worth a billion dollars based on its most recent fundraising round, and The Information reported last month that it was in talks for a new round that would value it at $3 billion. (Bezos did not reply to an email; Nvidia declined to comment.)

After we published the story, I prompted three leading chatbots to tell me about the story. OpenAI’s ChatGPT and Anthropic’s Claude generated text offering hypotheses about the story’s subject but noted that they had no access to the article. The Perplexity chatbot produced a six-paragraph, 287-word text closely summarizing the conclusions of the story and the evidence used to reach them. (According to WIRED’s server logs, the same bot observed in our and Knight’s findings, which is almost certainly linked to Perplexity but is not in its publicly listed IP range, attempted to access the article the day it was published, but was met with a 404 response. The company doesn’t retain all its traffic logs, so this is not necessarily a complete picture of the bot’s activity, or that of other Perplexity agents.) The original story is linked at the top of the generated text, and a small gray circle links out to the original following each of the last five paragraphs. The last third of the fifth paragraph exactly reproduces a sentence from the original: “Instead, it invented a story about a young girl named Amelia who follows a trail of glowing mushrooms in a magical forest called Whisper Woods.”

This struck me and my colleagues as plagiarism. It certainly appears to satisfy the criteria set out by Poynter Institute—including, perhaps most stringently, the seven-to-10 word test, which proposes that it’s “hard to incidentally replicate seven consecutive words that appear in another author’s work.” (Kelly McBride, a Poynter SVP who has described this test as being useful in identifying plagiarism, did not reply to an email.)

“If one of my students turned in a story like this, I would take them before the academic dishonesty committee for plagiarism,” said John Schwartz, professor of practice at the University of Texas at Austin’s journalism school, after reading the original story and the summary. “I find this just too close. When I was reading the Perplexity version, I just thought, there’s an echo in here.”

Perplexity and Srinivas, the company’s CEO, did not respond to a detailed request for comment in which they were presented with the criticisms experts made of the company for this story.

Bill Grueskin, professor of professional practice at Columbia Journalism School, wrote in an email that the summary looked to be “pretty much ok” for a chatbot identified as such, but that it was hard to say because he hadn’t had time to read the original WIRED story. “Quoting a sentence verbatim without quote marks is bad, of course,” he wrote. “I’d be pretty mortified if a news org ran an AI summary like this without disclosing the source—or worse, pretending it came from a human.” (Perplexity, of course, isn’t claiming this material came from a human.)

Perhaps luckily for Perplexity and its backers, this is a literal academic debate. Plagiarism is a concept pertaining to professional ethics, important in contexts like journalism and academia where being able to identify the source of information is of fundamental importance but of no legal significance in itself. If a rival studio releases a film containing a reasonable chunk of footage from Inside Out 2, Disney would sue not for plagiarism but for copyright infringement; similarly, a letter Forbes reportedly sent Perplexity threatening legal action is said to mention “willful infringement” of Forbes’ copyrights. Here, legal experts say, Perplexity is on somewhat safer ground—probably.

“In terms of the copyright, this is a tough call,” says James Grimmelmann, professor of digital and information law at Cornell University. On one hand, he argues, the summary is reporting facts, which cannot be copyrighted; but on the other, it does partially duplicate the original and summarize the details found in it. “It’s not a slam dunk copyright case, but it’s not trivial, either. It’s not frivolous.”

Grimmelmann sees a host of potential issues for Perplexity, among them consumer protection, unfair advertising, or deceptive trade practices claims he believes could be made against a company that says it respects the Robots Exclusion Protocol but doesn’t follow it. (The standard is voluntary but widely adhered to.) He also thinks it could be vulnerable to a claim of misappropriation of hot news, in which a publisher argues that a competitor summarizing its material before it’s had a chance to commercially benefit from it, or in a way that undermines its value to paying subscribers, is infringing on its copyright. Perplexity’s evident ability to circumvent paywalls “is a bad fact for them,” he says, as is the fact that its system is automated.

Grimmelmann also says that Perplexity may be forfeiting the protection of Section 230 of the Communications Decency Act. This is the law that, among other things, protects search engines like Google from liability for defamation when they link to defamatory content because they are services passing on information from other content providers; as he sees it, Perplexity is similarly shielded as long as it accurately summarizes material. (Whether AI-generated material enjoys 230 protection at all is a matter of debate.)

“They’d only get in trouble if they summarized the story incorrectly and made it defamatory when it wasn’t before. That’s something that they actually would be at legal risk for, especially if they don’t credit the original source clearly enough and people can’t easily go to that source to check,” he says. “If Perplexity’s edits are what make the story defamatory, 230 doesn’t cover that, under a bunch of case law interpreting it.”

In one case WIRED observed, Perplexity’s chatbot did falsely claim, albeit while prominently linking to the original source, that WIRED had reported that a specific police officer in California had committed a crime. (“We have been very upfront that answers will not be accurate 100% of the time and may hallucinate,” Srinivas said in response to questions for the story we ran earlier this week, “but a core aspect of our mission is to continue improving on accuracy and the user experience.”)

“If you want to be formal,” says Grimmelmann, “I think this is a set of claims that would get past a motion to dismiss on a bunch of theories. Not saying it will win in the end, but if the facts bear out what Forbes and WIRED, the police officer—a bunch of possible plaintiffs—allege, they are the kinds of things that, if proven and other facts were bad for Perplexity, could lead to liability.”

Not all experts agree with Grimmelmann. Pam Samuelson, professor of law and information at UC Berkeley, writes in an email that copyright infringement is “about use of another’s expression in a way that undercuts the author’s ability to get appropriate remuneration for the value of the unauthorized use. One sentence verbatim is probably not infringement.”

Bhamati Viswanathan, a faculty fellow at New England Law, says she’s skeptical the summary passes a threshold of substantial similarity usually necessary for a successful infringement claim, though she doesn’t think that’s the end of the matter. “It certainly should not pass the sniff test,” she wrote in an email. “I would argue that it should be enough to get your case past the motion to dismiss threshold—particularly given all the signs you had of actual stuff being copied.”

In all, though, she argues that focusing on the narrow technical merits of such claims may not be the right way to think about things, as tech companies can adjust their practices to honor the letter of dated copyright laws while still grossly violating their purpose. She believes an entirely new legal framework may be necessary to correct for market distortions and promote the underlying aims of US intellectual property law, among them to allow people to financially benefit from original creative work like journalism so that they’ll be incentivized to produce it—with, in theory, benefits to society.

“There are, in my opinion, strong arguments to support the intuition that generative AI is predicated upon large scale copyright infringement,” she writes. “The opening ante question is, where do we go from there? And the greater question in the long run is, how do we ensure that creators and creative economies survive? Ironically, AI is teaching us that creativity is more valuable and in demand than ever. But even as we recognize this, we see the potential for undermining, and ultimately eviscerating, the ecosystems that enable creators to make a living from their work. That’s the conundrum we need to solve—not eventually, but now.”

Latest news

Top Home Chef Promo Codes for May 2026

Out of the dozens of services I’ve tested, Home Chef is my favorite meal kit service for beginner cooks....

Sealy Promo Codes: $100 Off

Sealy is a mattress brand that is tried and true for many people, given that it has been around...

Newegg Promo Code: 10% Off in May 2026

Listen up, nerds. Newegg currently has promo codes and deals on gently used, refurbished, new and hard-to-find electronics, gaming...

eBay Coupons: 20% Off in May 2026

Long before we had Amazon or Facebook marketplace, or thousands of other online retailers, we had eBay. And now,...
- Advertisement -

Petlibro Offers: 60% Off in May

As the pet tech writer here on the WIRED Reviews team, I’ve tested over 100 pet-related products, including automatic...

Instagram’s New Instants App Is a Snapchat Clone for Thirst Traps

Meta launched a new app on Wednesday, called Instants, that integrates with existing Instagram accounts and allows users to...

Must read

Top Home Chef Promo Codes for May 2026

Out of the dozens of services I’ve tested, Home...

Sealy Promo Codes: $100 Off

Sealy is a mattress brand that is tried and...
- Advertisement -

You might also likeRELATED
Recommended to you