Perplexity AI Faces Legal Scrutiny Over Content Plagiarism Allegations
Perplexity AI, which was once predicted to replace Google Search, is under investigation for allegedly copying news items without giving due credit to the original authors. Forbes threatened legal action against the generative AI-powered search engine in early June, alleging that it had plagiarized its work.
Perplexity AI, which was once predicted to replace Google Search, is under investigation for allegedly copying news items without giving due credit to the original authors. Forbes threatened legal action against the generative AI-powered search engine in early June, alleging that it had plagiarized its work. Then, according to a Wired study, Perplexity AI may also be freely replicating information from other well-known news websites online.
Since then, a number of AI firms have come under fire for allegedly getting over paywalls and technical restrictions placed in place by publishers to stop the usage of their online content for the purpose of training AI models and producing summaries.
The CEO of Perplexity AI, Aravind Srinivas, has stated that a third-party service was at fault, but the controversy surrounding the AI startup is the most recent point of contention between AI companies claiming they should be permitted to copy content and news publishers claiming their content is being copied without authorization.
How it started?
Aravind Srinivas, an IIT Madras graduate, worked at renowned tech companies such as Google, Deepmind, and OpenAI before launching Perplexity, which aimed to disrupt how search results are displayed to customers by replying to their questions with tailored replies generated by AI.
In an interview, Srinivas explained how Perplexity AI accomplishes this by “crawling the web, pulling the relevant sources, only using the content from those sources to answer the question, and always telling the user where the answer came from through citations or references.”
In the search engine industry, Perplexity was therefore considered a minor competitor competing against tech giants like Google and Microsoft. But when it released a tool called “Pages,” which let users enter a prompt and get an AI-generated report with research and citations, things took a different turn. This report could be published as a web page and shared with anybody.
Why are publishers questioning Perplexity AI?
Apart from being accused of copying content and getting around paywalls, Perplexity AI has also been accused of not adhering to web standards like robots.txt files.
“A robots.txt file contains instructions for bots that tell them which web pages they can and cannot access,” claims cybersecurity company Cloudflare.
Robots.txt is mostly relevant to web crawlers that Google uses to search the internet and index material for search results. In order to prevent web crawlers from processing data on restricted web pages or directories, the page admin can leave behind certain commands.
Robots.txt is not legally enforceable, though, thus AI bots can choose to disregard the instructions in the file, so it is not much of a defense against them. Perplexity accomplished precisely that, according to Wired. The tech news portal discovered that Perplexity AI was able to access its information and produce a summary of it even though it was forbidden for the AI bot to scrape its website, confirming the results of a developer by the name of Robb Knight.
However, Perplexity AI is not the only company using dubious data harvesting techniques. According to a Wired report, Quora’s AI chatbot Poe goes beyond a summary by offering users the option to download an HTML file containing articles that are paywalled. Additionally, an increasing number of AI agents “are opting to bypass the robots.txt protocol to retrieve content from sites,” according to content licensing company Tollbit.
Another way for publishers to stop AI bots?
The subject of what further steps publishers may take to stop AI bots from using and scraping their online content without permission is crucial given the growing trend of these bots allegedly circumventing paywalled websites and breaking web standards.
According to Reddit, rate limitation is a mechanism that it uses in addition to updating its robot.txt file. Rate limiting is simply limiting the number of times users may execute specific operations (like signing into a web site) within a specified time frame. This approach is not perfect, but it can be used to separate legitimate traffic from AI traffic to websites.
Also Read: Europe’s AI Regulation Could Harm Innovation, Dutch Prince Sends US and China Warning