Authors suing Anthropic over the use of pirated books in AI training.

Authors File Class Action Against Anthropic for Alleged Use of Pirated Books

Authors Sue Anthropic Over Allegations of Copyright Infringement

A group of authors has initiated a class action lawsuit against Anthropic, a prominent artificial intelligence company, claiming that it has been using pirated books to train its AI models. This lawsuit was reported recently by Reuters and filed in a California court on Monday. The authors allege that Anthropic “built a multibillion-dollar business by stealing hundreds of thousands of copyrighted books.”

Details of the Lawsuit

The lawsuit highlights that Anthropic utilized a significant open-source dataset known as "The Pile" to train its Claude AI chatbots. Embedded within this dataset is Books3, which contains a massive collection of pirated ebooks, featuring works from renowned authors such as Stephen King, Michael Pollan, and thousands more. Anthropic confirmed to Vox earlier this month that it indeed used The Pile for training its models.

Claims Made by the Plaintiffs

The lawsuit states, "It is apparent that Anthropic downloaded and reproduced copies of The Pile and Books3, knowing that these datasets were comprised of a trove of copyrighted content sourced from pirate websites such as Bibliotik." The authors involved in this lawsuit are seeking the court's approval to certify their class action, alongside a request for damages against Anthropic, as well as a ban on the company's future use of copyrighted materials.

Background on the Authors

The group of authors filing the lawsuit includes established names like Andrea Bartz, known for her novel We Were Never Here; Charles Graeber, the author of The Good Nurse; and Kirk Wallace Johnson, who wrote The Feather Thief. Despite acknowledging that Books3 has been removed from the most recent official version of The Pile, the authors assert that the original version remains accessible online.

Related Legal Actions Against AI Companies

This lawsuit isn't the first of its kind. Last year, former Arkansas Governor Mike Huckabee, along with a group of writers, filed a similar lawsuit against major tech players, including Meta and Microsoft. These authors accused these companies of using their works without permission in training AI models. Additionally, notable authors like George R.R. Martin and Jodi Picoult have also filed lawsuits against OpenAI for alleged copyright infringements regarding their content.

Implications for AI Development

The outcome of these lawsuits could have significant consequences for the future of AI development and copyright law. As companies increasingly depend on large datasets for training their AI models, the legality of pulling content from potentially infringing sources comes into question. Authors are advocating for their rights and potentially paving the way for stricter regulations regarding the use of copyrighted materials in AI training.

Conclusion

The current legal battle against Anthropic showcases a growing concern within the creative community about the ethical implications of AI model training. As these events unfold, they may reshape the landscape for AI companies and content creators alike in the ongoing discussion about copyright and innovation.

Call to Action

For those interested in the intersection of technology and copyright law, this case serves as a critical example of how AI companies must navigate complex legal territory. Keep an eye on the developments surrounding this lawsuit and engage in discussions about the implications for both authors and tech innovators.

Back to blog