As we have seen in previous parts of this blog series, AI applications are quickly reaching the limits of copyright law. This starts with training the AI with huge amounts of music data, continues with processing that data in the hidden layers of the AI, and ends with AI output such as voice clones. Copyright issues can therefore arise throughout the entire process of AI-generated music, which we will explore in this part of the “AI in the Music Industry” series.

AI in the Music Industry – Part 14: AI and Copyright Infringement

Let’s remember how music AI works. Whatever AI systems are used, whether for music recommendation or music creation, they need huge amounts of audio data to train on. This happens at the input level. The data is then processed into an AI model. As we have seen that it is no longer possible for the operators of an AI to understand how this is happening. So it is a black box in which complex processing of the audio files takes place. Ultimately, the data is output in the form of compositions, lyrics and music recordings. For the sake of simplicity, we can therefore divide the AI process into an input phase, a processing phase and an output phase. In all of these phases, AI can touch on copyright issues and raise legal problems. In the input phase, which we will further divide into the data collection phase and the data training phase, there is the question of how to deal with copyrighted musical works and recordings. In the data processing phase, where the AI models for output generation are created, reproduction processes could take place that are relevant under copyright law. The output phase raises a number of legal issues. On the one hand, there is the question of whether pieces of music created by an AI can or should themselves enjoy copyright protection, and who is entitled to such protection: the AI itself, or the company that trained the AI or provided the AI software, or the user who applied an AI software tool to create the piece of music. On the other hand, an AI-created work may infringe the rights of third parties – think of the voice clones of superstars that can be easily created and distributed. This raises issues not only of copyright, but also of privacy and data protection.

The Use of Copyrighted Data for AI Training

Accordingly, the German Copyright Act grants the database producer the exclusive right to reproduce, distribute and make  the database or parts thereof publicly available.[5] However, reproduction is permitted for text and data mining purposes in accordance with §87c sec 8 line 4. The law defines this as the automated analysis of individual or multiple digital or digitised works in order to obtain information, in particular about patterns, trends and correlations.[6] This means that AI companies can make copies of database content without the consent of the database manufacturer for the purpose of AI training, provided that this falls under data mining. However, it is questionable whether this limitation in the German Copyright Act also applies to copyrighted content stored in the database. In any case, copies must be deleted if they are no longer needed for data mining.[7]

In any case, data is currently being collected for AI training without the consent of rights holders. This could change rapidly once the European Union’s ‘AI Act’, agreed by the EU Council and Parliament on 9 December 2023, becomes law and is transferred into national law in EU member states. Article C of the draft makes this clear: “Any use of copyright protected content requires the authorization of the rightsholder concerned unless relevant copyright exceptions apply.”[8] These exceptions are specified in the next paragraph and refer to the EU Database Directive (as amended in 2019), which allows text and data mining under certain conditions. Where the training of AI is not for scientific research, rightsholders have the right to opt out of text and data mining if they expressly reserve this right in an appropriate manner. In this case, providers of AI models must obtain permission for text and data mining for the purpose of AI training.

Universal Music Publishing et al versus Anthropic

However, rights holders already see AI training as an infringement of copyright, at least in the US, where there is still no legal regulation of the training of AI systems. On 18 October, Universal Music Publishing (UMP), together with Concord Music Group and ABKCO, which represents the rights of The Rolling Stones, filed a copyright infringement lawsuit against AI company Anthropic in a Nashville district court.[9] Anthropic is the provider of the ‘Claude’ chatbot, a rival product to ChatGPT, and was founded by former Open AI employees in 2021. The AI company received US $4 billion in funding from Amazon in late September 2023 as part of a wide-ranging collaboration, and the online retailer bought a minority stake in Anthropic.[10] Anthropic has previously raised US $300 million from Google, $500 million from cryptocurrency fraudster Sam Bankman-Fried, as well as investments from video communications platform Zoom and US software company Salesforce.[11]

Like any chatbot, Claude collects text from the internet to train its AI language model. This is where the lawsuit filed by Universal Music Publishing and its co-partners comes in. They accuse Anthropic of creating an AI model based on huge amounts of text collected from the internet. This includes “(…) lyrics to innumerable musical compositions for which Publishers own or control the copyrights, among countless other copyrighted works harvested from the internet.”[12] The plaintiffs point out that Anthropic neither requested nor received permission from the copyright holders to use the copyrighted works. The lawsuit lists a total of 500 musical works to which the plaintiffs hold copyrights, such as “What a Wonderful World” by Louis Armstrong (Concord Music Group), “You Can’t Always Get What You Want” by the Rolling Stones (ABKCO) or “I Will Survive” by Gloria Gaynor (UMP).[13] When requesting ‘Claude’ for the lyrics of one of these songs, the AI delivers the almost identical lyrics of the requested song, which is a copyright infringement. The damages to be awarded under US copyright law would be $150,000 per infringement (statutory damages), which would amount to $75 million in damages for 500 infringing works.[14] However, the lawsuit goes beyond this simple copyright infringement and also accuses the chatbot of making plagiarised copies of their copyrighted works. The plaintiffs are trying to prove this by showing that when a request is made without specifically naming the work, author or performer, the AI spits out a copyrighted song lyric without specifying its origin. For example, if ‘Claude’ were asked to write a song about the death of Buddy Holly, the chatbot would generate the hit song ‘American Pie’ by Don McLean,[15] which, according to the plaintiffs, constitutes plagiarism and also alters the mandatory copyright management information. The plaintiffs claim damages of US $25,000 for each infringement.[16]

Upon request, Anthropic submitted a statement to the US Copyright Office clarifying that the data was copied during the training process, but only for the purpose of statistical data analysis. The reproduction process is merely an intermediate step in generating non-protected elements from the dataset, from which the new outputs are then derived. In Anthropic’s view, therefore, the copyrighted work is not reused to deliver it to the users of the AI. This means that no copies of copyrighted data are being made, but rather AI models are being created that have entirely new properties. This is covered by the fair use provisions of the US Copyright Act and, according to Anthropic, also complies with the legal safe harbour provisions in Singapore, Japan, Taiwan, Malaysia, Israel and the European Union.

However, Anthropic did not comment on data collection, which also involves a duplication process, as the data must first be transferred to a database. This is where the rights holders could start. The courts will ultimately decide which legal position prevails, but it is likely that the lawsuit against Anthropic will serve to force the company, and other AI providers such as Open AI, to the negotiating table to achieve a licensing agreement with the rights holders. This has been the solution whenever a new technology has disrupted the music industry – just think of recorded music, radio, MTV and now the various digital music formats. Media analyst Mark Mulligan also recommends that this is how music rights owners should deal with AI offerings: “So, the most scalable solution for music rightsholders will be to fix the problem at the top, by ensuring that generative AI tools only learn from what they have permission to learn from. (…) The alternative (trying to license and/or collect royalties on the millions, billions or trillions of songs that will be created) would be a fool’s errand.”[17] Michael Nash, Chief Digital Officer of Universal Music Group, made a similar statement during the company’s earnings call in April 2023, clarifying that “Companies have to obtain permission and execute a license to use copyrighted content for AI training or other purposes, and we’re committed to maintaining these legal principles.”[18]

Licensing Issues of Copyrighted Data for AI Training

Another option would be to create an exception in copyright law for the use of AI training data, as it is already the case in many jurisdictions for the private use of copyrighted works. This would also entail the obligation to pay a fee, which could be calculated on the basis of the amount of data required for AI training. Again, this would create an unallocated pot of royalty income that could be distributed by collecting societies according to transparent and comprehensible rules. Similar to the flat-rate levy in Germany, this could be used to create funds to distribute subsidies for social and cultural projects in the music sector.

The above models will be favoured or rejected depending on the point of view and interests involved. Therefore a broad socio-political discourse is needed on how to deal with any ‘AI royalties’. In any case, the end result should be a political decision that balances all interests and represents a good compromise solution for all parties involved. Leaving it solely to the lobbying activities of tech and entertainment companies to find a solution for the use of AI training data would be the absolute worst approach.

The Training Process of AI and Copyright


Endnotes

[1] §15 of the German Copyright Act grants the author the exclusive right to reproduce his work in physical form (§16), to distribute it (§17) and to exhibit it (§18). Gesetz über Urheberrecht und verwandte Schutzrechte (Urheberrechtsgesetz) of 9 September 1965 (BGBl. I p 1273).

[2] Directive 96/9/EG of the European Parlament and of the Council of 11 March 1996 on the legal protection of databases amended by the Directive (EU) 2019/790 of the European Parlament and of the Council of 17 April 2019, article 1, sec. 2.

[3] Ibid. article 10 “Term of protection”

[4] Ibid. article 7, sec. 2a-b.

[5] § 87a-e Gesetz über Urheberrecht und verwandte Schutzrechte (Urheberrechtsgesetz) of 9 September 1965 (BGBl. I p 1273).

[6] Ibid. § 44b “Text und Data Mining”.

[7] Ibid. § 44b sec. 2.

[8] EU AI Act, “Compromise proposal on general purpose AI models/general purpose AI systems”, December 9, 2023, accessed: 2024-05-06.

[9] Music Business Worldwide, “AI company Anthropic recently secured up to $4bn in investment from Amazon. Now it’s being sued for copyright infringement by Universal Music Group”, October 18, 2023, accessed: 2024-05-06.

[10] Anthropic Pressemitteilung, “Expanding access to safer AI with Amazon”, September 25, 2023, accessed: 2024-05-06.

[11] Music Business Worldwide, “Blatant Plagiarism? 5 key takeaways from Universal’s lyrics lawsuit against AI unicorn Anthropic”, October 23, 2023, accessed: 2024-05-06.

[12] Concord Music Group, Inc. v. Anthropic PBC, Case 3:23-cv-01092, Complaint and Demand for Jury Trial in the United States District Court for the Middle District of Tennessee, Nashville Division, October 18, 2023.

[13] Ibid. appendix.

[14] Ibid., p 58.

[15] Ibid., pp 29-30.

[16] Ibid., p 58.

[17] Mark Mulligan, “AI will transform music; the question is how?”, Media Research Blog, April 18, 2023, accessed: 2024-05-06.

[18] Music Business Worldwide, “Universal Music Group: Yes, ripping off Drake’s voice for that AI track was illegal – and we’re certain of it”, April 27, 2023, accessed: 2024-05-06.

[19] Aaron van den Oord et al., 2016, “WaveNet: A generative model for raw audio”, arXiv:1609.03499 [Cs], p 1.

[20] The German Copyright Initiative (IU) is a platform of 44 professional organisations and trade unions representing 140,000 authors and artists. The text refers to the positions paper “Generative KI: Urheberrechtlicher Status quo & Handlungsempfehlungen”, which was published on 19 September 2023.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.