Reddit Sues Perplexity and Data Firms for Alleged Data Theft, Accuses Them of Bypassing Digital Guardrails

ziapirzada4 hours ago

0 1 4 minutes read

Reddit Sues Perplexity and Data Firms for Alleged Data Theft, Accuses Them of Bypassing Digital Guardrails

Reddit has filed a sweeping lawsuit against Perplexity and three other data-mining firms — Oxylabs UAB, AWM Proxy, and SerpApi — accusing them of illegally scraping its content and violating its digital protection systems.

The lawsuit, lodged Wednesday in Manhattan federal court, claims that the companies circumvented Reddit’s safeguards by exploiting Google’s search results to harvest data from the platform, according to Business Insider.

“These Defendants are similar to would-be bank robbers, who, knowing they cannot get into the bank vault, break into the armored truck carrying the cash instead,” the lawsuit states, alleging that the firms effectively stole Reddit’s user-generated content for commercial use.

Register for Tekedia Mini-MBA edition 19 (Feb 9 – May 2, 2026): big discounts for early bird.

Tekedia AI in Business Masterclass opens registrations.

Join Tekedia Capital Syndicate and co-invest in great global startups.

Register for Tekedia AI Lab: From Technical Design to Deployment (begins Nov 15th).

The legal action is one of the most aggressive moves yet by Reddit as it seeks to assert control over its vast archive of public conversations — an increasingly valuable dataset in the age of artificial intelligence.

According to Reddit, Perplexity ignored a cease-and-desist order sent in May 2024, which demanded it stop scraping data unless it reached a licensing deal similar to those Reddit signed with Google and OpenAI. Despite initially telling Reddit it would “respect Reddit’s robots.txt,” the platform’s lawsuit says Perplexity’s citations to Reddit surged “forty-fold after Reddit told it to stop.”

“Rather than respect Reddit and its users’ rights, what Perplexity has done in response is simply come up with increasingly devious schemes to circumvent Reddit’s security systems and policies,” the lawsuit claims.

Reddit alleges that Perplexity used at least one of the other named scraping firms to ingest its data into large language models (LLMs).

“In other words, Perplexity’s business model is effectively to take Reddit’s content from Google search results, feed them into a third party’s LLM, and call it a new product,” the complaint reads. “While that business model has somehow translated into a $20 billion valuation, it has not resulted in a willingness to pay for what others (including Google) have.”

Perplexity spokesperson Jesse Dwyer responded that the company “will always fight vigorously for users’ rights to freely and fairly access public knowledge,” adding that its approach “remains principled and responsible as we provide factual answers with accurate AI, and we will not tolerate threats against openness and the public interest.”

Representatives for Oxylabs and SerpApi said they plan to defend themselves, while AWM Proxy, described in the suit as a former Russian botnet, could not be reached for comment.

Reddit’s ‘Marked Bill’ Trap and Evidence of Scraping

The lawsuit details how Reddit set up a digital “marked bill” trap to prove that Perplexity was scraping its data. The company created a test post viewable only to Google’s search engine. Within hours, Reddit says, the post’s contents appeared in responses generated by Perplexity’s “answer engine,” confirming unauthorized access.

“Within hours, queries to Perplexity’s ‘answer engine’ produced the contents of that test post,” the filing states.

Cloudflare CEO Matthew Prince weighed in on the controversy earlier this year, likening Perplexity’s alleged tactics to those of cybercriminals.

“Some supposedly ‘reputable’ AI companies act more like North Korean hackers,” Prince wrote on X in August. “Time to name, shame, and hard block them.”

Reddit’s Chief Legal Officer Ben Lee said the lawsuit highlights the growing problem of illicit scraping operations that feed the AI industry.

“Scrapers bypass technological protections to steal data, then sell it to clients hungry for training material,” Lee told Business Insider. “Reddit is a prime target because it’s one of the largest and most dynamic collections of human conversation ever created.”

The company said it has invested tens of millions of dollars over the years to combat automated data collection.

AI Partnerships and the Battle Over Data Control

The lawsuit comes as Reddit doubles down on turning its trove of user-generated content into a profitable asset. In March 2024, Reddit struck a lucrative licensing deal with Google that allows the search giant to train its AI models using Reddit posts. In return, Reddit gained access to Google’s Vertex AI tools, enhancing its own search and content moderation capabilities.

“Reddit is one of the few platforms positioned to become a true search destination,” the company said in its Q2 2024 report. “We offer something special: a breadth of conversations and knowledge you can’t find anywhere else. Every week, hundreds of millions of people come to Reddit looking for advice, and we’re turning more of that intent into active users of Reddit’s native search.”

The deal came just one month before Reddit’s highly anticipated IPO, which valued the company at $6.4 billion.

However, the Reddit-Perplexity clash is seen as part of growing tension between content owners and AI developers over who controls access to public data. While AI companies argue that publicly available information should remain free to use for training algorithms, content platforms like Reddit, The New York Times, and Getty Images insist that unauthorized scraping amounts to intellectual property theft.

For Reddit, which has been positioning itself as both a social network and a data company, the lawsuit marks an effort to establish new boundaries in the age of generative AI — and to send a clear message that it intends to monetize its content on its own terms.