We all know that artificial intelligence needs data to learn. Lots of data. But a new bombshell legal filing suggests that the tech giant Nvidia didn’t just accidentally stumble upon copyrighted books. They allegedly went out and asked for them.
In a shocking update to a class-action lawsuit, authors claim that Nvidia employees directly contacted Anna’s Archive—one of the world’s most infamous pirate libraries—to get their hands on millions of books. This wasn’t just a small mistake; it looks like a calculated move.
Desperate for data
According to the new court documents filed in California, Nvidia was feeling “competitive pressure” to build the best AI models. To do that, they needed high-quality text, like books. Desperate times apparently called for desperate measures.
The plaintiffs, who are authors suing the company for copyright infringement, say they have proof. They cite internal emails where a member of Nvidia’s data strategy team reached out to the operators of Anna’s Archive. Their goal? To get “high-speed access” to the site’s massive collection of pirated materials. They didn’t want to download books one by one; they wanted the whole library.
Ignoring the warnings
Here is where the story gets even crazier. When Nvidia made contact, the people running Anna’s Archive actually warned them. They reportedly told Nvidia clearly that the books were acquired illegally and that using them might be risky.
You might think a trillion-dollar company would stop there. Think again.
The lawsuit alleges that Nvidia executives discussed this warning and, within just one week, gave the “green light” to proceed anyway. They reportedly didn’t care about the legal risks because they needed the data so badly. As a result, Anna’s Archive allegedly gave Nvidia access to download roughly 500 terabytes of data. That is an unimaginable amount of reading material.
Beyond just one source
It wasn’t just Anna’s Archive. The lawsuit claims Nvidia used other controversial sources too. These include:
- Books3: A notorious dataset of pirated books.
- LibGen (Library Genesis): A huge shadow library for scientific papers and books.
- Sci-Hub: Known for bypassing paywalls on research papers.
- Z-Library: Another massive ebook repository.
The authors argue that Nvidia didn’t just use these books for themselves. They allegedly created tools and scripts that helped their customers download these illegal datasets too. This adds a layer of “vicarious infringement” to the case, meaning Nvidia could be in trouble for helping others break the law.
Nvidia’s defense
So, what does Nvidia say about all this? In the past, their defense has been simple: Fair Use.
They argue that teaching an AI model isn’t the same as copying a book to sell it. They claim their AI just looks at “statistical correlations” in the text—basically learning how words fit together—rather than “reading” the book like a human does. They believe this shouldn’t count as copyright infringement.
However, these new emails might make that defense harder to sell. It is one thing to scrape the open internet; it is another to secretly email a known pirate site and ask for a bulk discount on stolen goods.
What this means for the future
This lawsuit is being led by authors like Abdi Nazemian, Brian Keene, and Susan Orlean. If they win, it could change everything for the AI industry. Tech companies might have to pay billions in compensation or delete their powerful models entirely.
For now, one thing is clear: the race to build the smartest AI is getting dirty, and the secrets are starting to spill out.
