AI Large Language Models breaking law ‘on massive scale’

AI Large Language Models breaking law ‘on massive scale’

The Publishers Association's CEO Dan Conway tells UK MPs about the danger of LLMs

by Suswati Basu
0 comment

In a recent session before the UK’s Communications and Digital Committee, Dan Conway, CEO of the Publishers Association, raised alarm bells about Large Language Models (LLMs) and their alleged widespread infringement of copyright law. During the session, Conway emphasised the need for the safe and responsible development of AI technologies. The session also featured insights from experts Arnav Joshi, Richard Mollet, and Dr. Hayleigh Bosher, shedding light on the complex intersection of copyright law, intellectual property, and AI innovation.

Full committee meeting from Parliament TV.

Large language models and copyright infringement

Conway, representing the Publishers Association, kicked off the discussion by acknowledging the potential of LLMs and technological advancements but expressed concern about their current state. He highlighted that LLMs are infringing on copyrighted content on a massive scale, citing the existence of the “Books3 database,” which contains pirated book titles ingested by these models. According to Conway, the content ingestion is vast, and the LLMs are not properly licensing this material, thereby violating copyright laws.

Read: Praise for MPs’ bid to protect songs and books from AI mining

Conway stated, “They’re not currently being compliant with IP law. We’ve had conversations with technical experts around the processes undergone by these large language models, and it is our contention to the committee that these large language models infringe copyright at multiple parts of the process.”

“[It] is our contention to the committee that these large language models infringe copyright at multiple parts of the process in terms of when they collect this information, how they store this information, and how they handle it. So it’s our contention that copyright law is being broken on a massive scale.”

Dan Conway, Publishers Association CEO

Dr. Hayleigh Bosher, an expert in intellectual property law, reiterated the principle of copyright: to grant creators the right to decide how their work is used and whether a license is required. She asserted that AI’s actions, such as ingestion, running programs, and generating content, would typically require a license as they involve reproducing copyrighted works without permission.

Read: Creative bodies urge UK PM: safeguard copyright at AI Summit

Clifford Chance Senior Associate Arnav Joshi, focusing on digital regulation and data protection matters, deferred to other panel members but underlined the need for clarity and guidance for companies navigating the complexities of AI development and data protection.

Richard Mollet, representing RELX, shared insights from various sectors and stressed the importance of transparency in data usage to reward creators, incentivise high-quality data creation, and ensure trust in AI outputs. He also noted that while US law might differ in its interpretation of copyright, UK and EU laws are clear that commercial entities reproducing copyrighted works must obtain permission.

Copyright, innovation, and legitimate interests

The committee delved deeper into the role of copyright in promoting innovation. Dr. Bosher argued that copyright aims to encourage creativity and innovation while balancing the protection of creators’ rights. Mollet added that copyright drives innovation by incentivising companies to invest in data quality and peer-reviewed content, essential in fields like scientific and legal research.

Lord Kamal raised the question of whether copyright hinders or encourages innovation. Dr. Bosher and Mollet both underscored that copyright doesn’t hinder innovation but rather provides a framework to encourage creativity and protect creators’ interests.

Understanding the distinction: reading vs. copying

Baroness Anna Healy posed a question about the distinction between reading and copying concerning copyright and large language models. Conway explained that copyright covers a bundle of exclusive rights, including reproduction, communication to the public, and more. He argued that LLMs, during their technical processes, must make copies of copyrighted content, making them subject to copyright law.

Dr. Bosher clarified that while metaphors can be misleading, the critical factor is why AI processes data and whether it does so for commercial purposes. She showcased that AI processes data for commercial gain, requiring licenses and violating the purpose of copyright.

Privacy and data protection

Joshi focused on privacy and personal data in relation to LLMs. He noted that most companies are genuinely trying to comply with data protection laws, although they seek more specific guidance. He cited an Ipsos study showing that only 9% of people use generative AI, highlighting misconceptions around AI’s impact.

He also discussed GDPR compliance, explaining that while challenges exist, companies generally comply by conducting data protection impact assessments. He emphasised that GDPR’s technology-neutral approach allows organisations to mark their own compliance homework, with regulators stepping in only when necessary.

Baroness Healy raised concerns about inadvertently leaking private information and the difficulty in rectifying such breaches once data is exposed. Joshi acknowledged the risk but stated that wide-ranging non-compliance with data protection laws is not widespread, though vigilance is necessary.

Consequently, the session called attention to the complex legal and ethical issues surrounding LLMs, copyright, and data protection. The debate underlined the importance of ensuring AI development aligns with intellectual property laws, fosters innovation, and protects individuals’ privacy and rights. The call for safe and responsible AI development resonated as experts and policymakers grapple with these challenges in the digital age.

Subscribe to my newsletter for new blog posts, recommendations & episodes. Let’s stay updated!


Make a one-time donation

Make a monthly donation

Make a yearly donation

Choose an amount


Or enter a custom amount


Your contribution is appreciated, as everything you give we put back so we can provide the best information.

Your contribution is appreciated, as everything you give we put back so we can provide the best information.

Your contribution is appreciated, as everything you give we put back so we can provide the best information.

DonateDonate monthlyDonate yearly

You may also like

Leave a Reply

Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?