Partly legal, partly illegal

Judge: Anthropic may train AI with books without author’s permission

Anthropic
Image source: rafapress / Shutterstock.com

In a landmark ruling, a US federal court in San Francisco has decided that the use of copyrighted books to train an AI model is legal under certain conditions.

At the same time, however, the court also found that the storage and use of illegally obtained books constitutes a copyright infringement.

Ad

Partial success for Anthropic

The technology company Anthropic, backed by Amazon and Alphabet, went to court over the use of books to train its Claude language model. The lawsuit was filed by several authors, including Andrea Bartz, Charles Graeber and Kirk Wallace Johnson. They accused Anthropic of using their works without permission and without remuneration.

Judge William Alsup has now ruled that the training of the AI model with this content falls under the so-called “fair use” doctrine. This means that the use in this case can be classified as legally permissible. The decisive factor here was the assessment that the AI model used content in a transformative way – i.e. it did not copy it, but rather developed something fundamentally new from it.

Limits of the fair use doctrine

At the same time, however, the court drew a clear line: the storage of more than seven million illegally obtained digital copies of books in an internal database (“central library”) was not covered by fair use. This was a direct copyright infringement. On this point, Judge Alsup will follow up in December with proceedings to determine damages. US copyright law provides for penalties of up to 150,000 US dollars per affected work in cases of willful infringement.

Ad

Setting a precedent for the AI industry

This decision is the first of its kind to explicitly address the concept of fair use in the context of generative artificial intelligence. Companies such as Anthropic, OpenAI and Meta argue that their AI systems promote creative achievements through the use of copyrighted content and therefore fall under the protection of this regulation.

Judge Alsup partially recognized this point of view: AI training is “extraordinarily transformative”. The comparison with a learning human is apt – a language model that analyzes books does not do so to replace them, but to create something new from them.

Dispute over the origin of the data

The court identified a critical point in the origin of the content used. Anthropic had argued that it was irrelevant whether the books came from legal or illegal sources. The judge clearly disagreed: the deliberate procurement of pirated copies could in no way be justified by fair use – especially if legal alternatives were available.

Background: Growing number of lawsuits

The case is one in a series of lawsuits that authors, media companies and other rights holders have filed against leading AI companies. They all accuse the companies of using intellectual property without permission to develop their systems – and potentially undermining existing business models in the process.

The ruling sets an important precedent for the evaluation of AI training practices in the light of copyright law. While the use of copyrighted content is considered legal under certain conditions, the court also issued a clear warning to take care when obtaining the data. The outcome of the trial in December could be groundbreaking for the entire industry.

Ad

Weitere Artikel