Artificial Intelligence software such as ChatGPT, Google’s Gemini and Anthropic’s Claude mimic human intelligence by using artificial neural networks. Artificial neural networks use computer code and mathematics to simulate how neurons in a brain interact and transit signals to one another.
Similar to how children need to be educated, AI software needs their neural networks to be trained on data to learn how to mimic human intelligence.
ChatGPT 4.0 used 570 GB of data for training for example; this is a dataset that contains most of the world’s important fiction and nonfiction books, especially textbooks. Much of this came in the form of books and material that AI companies actually download illegally without the consent of authors.
One of the critical legal questions of the AI age is whether copyright law protects authors from having their books used to train AI models without consent or payment. This issue was addressed in a critical court ruling this month (June 2025) that found that training AI models with copyrighted materials can be covered by the concept of “fair use” in copyright law.
Fair use is a doctrine in copyright law that allows limited use of copyrighted material without permission from the copyright holder. Section 107 of the Copyright Act of the USA is paraphrased as follows:
“In determining whether the use made of a work in any particular case is a fair use the factors to be considered shall include—
(1) the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes;
(2) the nature of the copyrighted work;
(3) the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and
(4) the effect of the use upon the potential market for or value of the copyrighted work.”
On June 23, 2025, a San Francisco judge issued a major ruling on AI and Fair use. Ai Firm Anthropic, creator of the AI called Claude, was sued by several authors for violating their copyright in the case of Andrea Barts, Charles Graeber, and Johnson v Anthropic Pbc, In a summary judgment ruling, Justice William Alsup introduced the problem as follows:
“An artificial intelligence firm downloaded for free millions of copyrighted books in digital form from pirate sites on the Internet. The firm also purchased copyrighted books (some overlapping with those acquired from the pirate sites), tore off the bindings, scanned every page, and stored them in digitised, searchable files. All the foregoing was done to amass a central library of “all the books in the world” to retain “forever.” From this central library, the AI firm selected various sets and subsets of digitised books to train various large language models under development to power its AI services. Some of these books were written by plaintiff authors, who now sue for copyright infringement. On summary judgment, the issue is the extent to which any of the uses of the works in question qualify as “fair uses” under Section 107 of the Copyright Act.”
The court ruled in favour of Anthropic that copies used to train LLMs were covered by fair use. This makes sense, as creating obstacles to AI research in the United States would harm economic growth and innovation. However, the court took issue with the library of pirated books that Anthropic kept.
“The downloaded pirated copies used to build a central library were not justified by a fair use. Every factor points against fair use. Anthropic employees said copies of works (pirated ones, too) would be retained “forever” for “general purpose” even after Anthropic determined they would never be used for training LLMs. A separate justification was required for each use. None is even offered here except for Anthropic’s pocketbook and convenience.”
The court then moved to have a trial for the copyright issues raised by the pirated library of stolen books.
“We will have a trial on the pirated copies used to create Anthropic’s central library and the resulting damages, actual or statutory (including for wilfulness). That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for the theft but it may affect the extent of statutory damages.”
As an author of two books myself, I hope that courts strengthen copyright protections and ensure that companies like Anthropic, which is valued at US$60 billion, pay authors for copies of their books instead of pirating them to create libraries of content that train AI models.