First AI case on copyright – but authors were not in the room

English version of my article on portfolio.hu The hearing in the case C-250/25, Like Company, took place the 10th March – and the Court had a lot of questions about how large language models (LLMs) use the information they are trained on and how chatbots reflect the training data on one hand and data found on the Internet on the other – in the context of copyright on the digital market (Directive 2001/29/EC and Directive 2019/790). The questions, whether using a published article to train an LLM involves reproduction and whether repeating a part of the article large enough to be protected by copyright or a related right in the response of a chatbot is communication to the public, are to frame the decision whether the publisher’s right was infringed. Namely, it was not the author but the publisher who sued Google for infringement of copyright. An article in a small local portal about a celebrity wanting to settle dolphins in the biggest warm water lake of Europe, the Balaton in Hungary, was the bone of contention. The case is handled by a court in Budapest, and this court asked the European Court of Justice to interpret the relevant directives whether the activities of Google’s AI chatbot, Gemini, are infringing the directives by not acquiring the authorisation of the publisher and not paying an appropriate fee for using its content. Google’s defence included reference that the plaintiff could not prove that the articles were reproduced in the responses of the chatbot. Their lawyer could not deny, however, that this can occur and there are no technical means to prevent it – although he argued that the probability of a one to one quoting of the original article is minute – among others as the original text used to train the system is only one of a multitude of training sources, and the chatbot uses probability basis to generate one work after the other. On the other hand, if a chatbot finds that concrete information from the Internet can improve the quality of the response, it reaches out to information published – performs a search – and uses the information so found in generating the response. This is again an opportunity to reproduce and communicate to the public the content of the publisher. The question whether the repetition of the content can be proven, is not a question for the European Court of Justice. It is in the nature of the reference for preliminary ruling that the Court of Justice only interprets European law, and the assessment of the facts is the task of the court handling the original lawsuit. Another concrete discussion point was whether Google provided a means for publishers to reject the use of their materials for training AI systems and for generating the responses of a chatbot without disadvantaging the publication’s appearance among the search results of the usual search engine of Google. The importance of the case is that as on one hand publishers have a great interest to appear in the search results of usual search engines, where the engine provides the link to the original article, the response of a chatbot, in particular if it contains a large chunk of the original content, is sufficient for the user and thus does not generate traffic for the original publisher. (One previous case caused actually Google to cut the amount of content of the original articles to be shown in the search results to ensure that users click on the link.) Therefore when people turn more and more to chatbots for answers instead of the traditional search engines and these chatbots can provide the original content without benefit to the publisher, the publishers will lose their revenues, go bankrupt, and – as the lawyer of the publisher explained – there will be no content for the chatbots to show either. On the other side is the right to conduct business and the freedom of information. Google correctly argued, that only the texts are protected by copyright, the information itself is not. The importance of the case is also illustrated by the fact that not only Hungary, but five other member States and also the European Commission intervened in the case, stating their opinion. The Member States mainly supported the plaintiff, while the Commission found – based on the hypothetical nature of some questions and that they were not related to the facts of the case, that only a small art of the questions is admissible. They also deplored that it is only the publisher who sues and the authors are absent from the case. On the other hand, the Commission still gave its opinion- for the case when the Court still finds the questions admissible – on a number of topics, mostly also supporting the publisher. The most interesting statement of the Commission was that the technical details explained by Google – that the content is not reproduced, only represented by tokens and vectors even among the training data – is irrelevant, among others as small companies cannot be expected to be familiar with these technical details and to challenge the technical arguments of the providers of AI systems. Some auxiliary questions discussed were whether European copyright law can be applied if the training of the LLM takes place on a server in the US. The general view seems to be that the whole process has to be looked at as one entity and that – when providing answers among others in Hungarian or another European language – the communication by the chatbot is targeted at a European audience as well. It was also mentioned that Google did not publish an AI compliance policy – which would be its obligation – and as Google knows how the systems work, the burden of proof that the content s not being reproduced and/or communicated to the public, falls on it. As the case is complex, the Advocate General will read his opinion the 3rd September and this may be a case where the opinion of the advocate general will not 100% be followed by the Court. If the questions or most of them are found admissible, this could be a good start to clarify how far the use by Ai systems of information can be covered by existing legislation and how far providers of information can rely on a fair deal for the use of their creations.

Data protection

Search This Blog

First AI case on copyright – but authors were not in the room

Comments

Post a Comment

Popular posts from this blog

A Hungarian case about processing data based on law - what are the requirements?

Doubts around data transfer - use of derogations

What the games... tricks in cookie banners