Skip to main content

First AI case on copyright – but authors were not in the room

English version of my article on portfolio.hu The hearing in the case C-250/25, Like Company, took place the 10th March – and the Court had a lot of questions about how large language models (LLMs) use the information they are trained on and how chatbots reflect the training data on one hand and data found on the Internet on the other – in the context of copyright on the digital market (Directive 2001/29/EC and Directive 2019/790). The questions, whether using a published article to train an LLM involves reproduction and whether repeating a part of the article large enough to be protected by copyright or a related right in the response of a chatbot is communication to the public, are to frame the decision whether the publisher’s right was infringed. Namely, it was not the author but the publisher who sued Google for infringement of copyright. An article in a small local portal about a celebrity wanting to settle dolphins in the biggest warm water lake of Europe, the Balaton in Hungary, was the bone of contention. The case is handled by a court in Budapest, and this court asked the European Court of Justice to interpret the relevant directives whether the activities of Google’s AI chatbot, Gemini, are infringing the directives by not acquiring the authorisation of the publisher and not paying an appropriate fee for using its content. Google’s defence included reference that the plaintiff could not prove that the articles were reproduced in the responses of the chatbot. Their lawyer could not deny, however, that this can occur and there are no technical means to prevent it – although he argued that the probability of a one to one quoting of the original article is minute – among others as the original text used to train the system is only one of a multitude of training sources, and the chatbot uses probability basis to generate one work after the other. On the other hand, if a chatbot finds that concrete information from the Internet can improve the quality of the response, it reaches out to information published – performs a search – and uses the information so found in generating the response. This is again an opportunity to reproduce and communicate to the public the content of the publisher. The question whether the repetition of the content can be proven, is not a question for the European Court of Justice. It is in the nature of the reference for preliminary ruling that the Court of Justice only interprets European law, and the assessment of the facts is the task of the court handling the original lawsuit. Another concrete discussion point was whether Google provided a means for publishers to reject the use of their materials for training AI systems and for generating the responses of a chatbot without disadvantaging the publication’s appearance among the search results of the usual search engine of Google. The importance of the case is that as on one hand publishers have a great interest to appear in the search results of usual search engines, where the engine provides the link to the original article, the response of a chatbot, in particular if it contains a large chunk of the original content, is sufficient for the user and thus does not generate traffic for the original publisher. (One previous case caused actually Google to cut the amount of content of the original articles to be shown in the search results to ensure that users click on the link.) Therefore when people turn more and more to chatbots for answers instead of the traditional search engines and these chatbots can provide the original content without benefit to the publisher, the publishers will lose their revenues, go bankrupt, and – as the lawyer of the publisher explained – there will be no content for the chatbots to show either. On the other side is the right to conduct business and the freedom of information. Google correctly argued, that only the texts are protected by copyright, the information itself is not. The importance of the case is also illustrated by the fact that not only Hungary, but five other member States and also the European Commission intervened in the case, stating their opinion. The Member States mainly supported the plaintiff, while the Commission found – based on the hypothetical nature of some questions and that they were not related to the facts of the case, that only a small art of the questions is admissible. They also deplored that it is only the publisher who sues and the authors are absent from the case. On the other hand, the Commission still gave its opinion- for the case when the Court still finds the questions admissible – on a number of topics, mostly also supporting the publisher. The most interesting statement of the Commission was that the technical details explained by Google – that the content is not reproduced, only represented by tokens and vectors even among the training data – is irrelevant, among others as small companies cannot be expected to be familiar with these technical details and to challenge the technical arguments of the providers of AI systems. Some auxiliary questions discussed were whether European copyright law can be applied if the training of the LLM takes place on a server in the US. The general view seems to be that the whole process has to be looked at as one entity and that – when providing answers among others in Hungarian or another European language – the communication by the chatbot is targeted at a European audience as well. It was also mentioned that Google did not publish an AI compliance policy – which would be its obligation – and as Google knows how the systems work, the burden of proof that the content s not being reproduced and/or communicated to the public, falls on it. As the case is complex, the Advocate General will read his opinion the 3rd September and this may be a case where the opinion of the advocate general will not 100% be followed by the Court. If the questions or most of them are found admissible, this could be a good start to clarify how far the use by Ai systems of information can be covered by existing legislation and how far providers of information can rely on a fair deal for the use of their creations.

Comments

Popular posts from this blog

A Hungarian case about processing data based on law - what are the requirements?

This question can be interesting in respect of the latest change in Hungarian health data processing: doctors performing health on the workplace tests are obliged to upload the entire files to the common health space where access is not as limited as it should be. The concrete case adjudicated by the European Court of Justice concerns the processing of COVID vaccination data, also based on national law. For processing based on a legal obligation to which the controller is subject, Member Statesmay maintan and introduce specific provisions determining more specific requirements and can also describe features of the processing, including measures to ensure fair and lawful processing. Processing of special categories of data (including health data) for reasons of substantial public interest (in any area) or of public interest in the area of public health requires that the élaw should provide for suitable and specific measures to safeguard the fundamental rights and interests of the data ...

Doubts around data transfer - use of derogations

 A lot happened since Schrems-II , among others the European Data Protection Board published a FAQ document , a guidance on essential guarantees for surveillance measures      and submitted another guidance , on measures that supplement transfer tools. Transfer tools are either safeguards which ensure that data subjects enjoy adequate protection of their privacy at the place and in the organisation to where their data are transferred or derogations which enable transfer essentially without adequate protection. I used the term adequate protection and previously the view was that the protection ensured need not be identical with that in the EU. The Schrems II judgment, however, speaks about equivalent protection and this is stronger. In case the derogations (according to article 49 GDPR) are used, the EDPB is of the view that the last sentence of Article 44 GDPR (All provisions in this Chapter shall be applied in order to ensure that the level of protection of natural...

The right to information and data subject access requests

The European Court of Justice dealt with some cases concerning data subject access requests and clarified the scope of certain information to be provided. 1. The right to informationThe data subjects have the right to be informed about how their personal data are processed by the controller. This information has to be provided using a privacy statement which is also called data protection notice. The privacy statement has a set content which serves not only to inform data subjects about which of their personal data are processed and how but also to assure them that their personal data are processed in compliance with EU rules. Some information in the privacy statements is nevertheless general and therefore data subjects can request further information and access to the personal data the controller processes about them. Privacy statements can be displayed on the webpages of the controller. Some controllers publish one comprehensive privacy statement which contains information about vari...