When Meta debuted its new Threads feature earlier this month, it was quickly dubbed “the Twitter killer.” Elon Musk, the owner of X (formerly Twitter), even threatened to sue Meta for what he called a “copycat” product.
Launched off the back of Instagram, also owned by Meta, Threads looks very similar to X. It’s scrollable, text-based, and character-limited. But why, when X has been notoriously unprofitable, would Meta—which brought us the infamous “pivot to video” and has had its sights set on competing with TikTok—want to take on the platform? The answer may have to do with artificial intelligence.
Recent months have seen a veritable AI arms race, with tools like ChatGPT, Midjourney, Stable Diffusion, Copilot, Dall-E, and Google’s Bard all jockeying for users. As more companies invest in generative AI, they need lots of data to train their models. And that data needs to be generated by actual humans in order for the generative AI to appear, well, human. Platforms like Reddit and X are gold mines because they host millions of examples of user-generated content. Both companies have also historically made their data readily available, a boon for third-party developers and researchers. In 2020 alone, data from X contributed to more than 17,000 research papers. Models like ChatGPT and Bard were also trained on data from these platforms. But this has sparked bigger questions about how much user-generated data is worth, and what it should cost to access. Now, that data may not be readily available for long, just as every company, including Meta, is rushing to develop their own models.
Earlier this year, Musk announced that X would begin charging $42,000 a month for its API, pricing out nearly everyone that used it, particularly academics and researchers, for whom data from X was crucial for research into topics like disinformation. Later, the company said it would offer tiers of access priced at $125,000 and $210,000 per month. Not long thereafter, Reddit announced it would also start charging for its API. In an interview with The New York Times, Reddit CEO Reed Huffman acknowledged that the “Reddit corpus of data is really valuable” for training AI models but that the company didn’t feel the “need to give all of that value to some of the largest companies in the world for free.”
In the past several months, Musk has continued to crack down on access to X’s data. In April, he tweeted that Microsoft had “illegally” used data from X to train its AI models (Microsoft is a partner with Meta and OpenAI, which created ChatGPT). A letter from X’s lawyer alleged that the company had exceeded the allowed use of the data it drew from the platform. Then, last month, Twitter announced it would restrict the ability to see the site’s content without first logging in, and that to see more that 600 tweets per day, users would need to pay for Twitter Blue. Musk called it a “temporary emergency measure” to prevent what he called “data pillaging.” (XCorp, which owns X, filed a lawsuit shortly after against four unnamed defendants, seeking $1 million in damages for data scraping). As Musk has limited access to the platform, he has also launched his new xAI startup, which will be trained on data from X.
What does all this have to do with Threads? Meta, which gambled its future—and its name—on the metaverse, has fallen behind in its investment in AI. But last week, the company announced it would be making its large language model, Llama 2, open source, free for researchers and businesses alike (this also means, however, it won’t have some of the safeguards of ChatGPT, namely the ability to revoke access from users who violate the terms of use to generate, say, disinformation). Threads could bolster its efforts to get back in the AI game, just when X is no longer an option for companies seeking to harvest data.