AI in the Music Industry – Part 14: AI and Copyright Infringement

As we have seen in previous parts of this blog series, AI applications are quickly reaching the limits of copyright law. This starts with training the AI with huge amounts of music data, continues with processing that data in the hidden layers of the AI, and ends with AI output such as voice clones. Copyright issues can therefore arise throughout the entire process of AI-generated music, which we will explore in this part of the “AI in the Music Industry” series.

AI in the Music Industry – Part 14: AI and Copyright Infringement

Let’s remember how music AI works. Whatever AI systems are used, whether for music recommendation or music creation, they need huge amounts of audio data to train on. This happens at the input level. The data is then processed into an AI model. As we have seen that it is no longer possible for the operators of an AI to understand how this is happening. So it is a black box in which complex processing of the audio files takes place. Ultimately, the data is output in the form of compositions, lyrics and music recordings. For the sake of simplicity, we can therefore divide the AI process into an input phase, a processing phase and an output phase. In all of these phases, AI can touch on copyright issues and raise legal problems. In the input phase, which we will further divide into the data collection phase and the data training phase, there is the question of how to deal with copyrighted musical works and recordings. In the data processing phase, where the AI models for output generation are created, reproduction processes could take place that are relevant under copyright law. The output phase raises a number of legal issues. On the one hand, there is the question of whether pieces of music created by an AI can or should themselves enjoy copyright protection, and who is entitled to such protection: the AI itself, or the company that trained the AI or provided the AI software, or the user who applied an AI software tool to create the piece of music. On the other hand, an AI-created work may infringe the rights of third parties – think of the voice clones of superstars that can be easily created and distributed. This raises issues not only of copyright, but also of privacy and data protection.

The Use of Copyrighted Data for AI Training

We will address all these legal issues below, starting with the input phase and the problem that musical works and recordings must first be collected in order to train the AI. Collecting data obviously requires a database in which the works or sound files are stored. In any case, the collection of copyrighted data creates a conflict with copyright law, which protects authors from unauthorised use.[1] However, we must differentiate between the database and the copyright-protected content. The EU Directive on the legal protection of databases defines a database as ” a collection of independent works, data or other materials arranged in a systematic or methodical way and individually accessible by electronic or other means.”[2] It is irrelevant whether the database consists of copyrighted content or not. The protection applies to the database as a whole or to individual parts of it, the selection and arrangement of the contents being regarded as the intellectual creation of the creator(s) of the database. However, the primary object of protection is not the intellectual creation but the protection of the investment involved in creating the database, as can be seen from Article 7 ‘Object of Protection’ of the EU Directive. The term of protection is therefore considerably shorter than copyright protection at 15 years from the date of creation of the database.[3] Database producers therefore have the right to prohibit the extraction of data and its further use during this protection period. The Directive defines extraction as the permanent or temporary transfer of the contents of a database to other media, i.e. copying the data. Re-utilisation means any form of making the contents of the database available to the public.[4]

Accordingly, the German Copyright Act grants the database producer the exclusive right to reproduce, distribute and make the database or parts thereof publicly available.[5] However, reproduction is permitted for text and data mining purposes in accordance with §87c sec 8 line 4. The law defines this as the automated analysis of individual or multiple digital or digitised works in order to obtain information, in particular about patterns, trends and correlations.[6] This means that AI companies can make copies of database content without the consent of the database manufacturer for the purpose of AI training, provided that this falls under data mining. However, it is questionable whether this limitation in the German Copyright Act also applies to copyrighted content stored in the database. In any case, copies must be deleted if they are no longer needed for data mining.[7]

In any case, data is currently being collected for AI training without the consent of rights holders. This could change rapidly once the European Union’s ‘AI Act’, agreed by the EU Council and Parliament on 9 December 2023, becomes law and is transferred into national law in EU member states. Article C of the draft makes this clear: “Any use of copyright protected content requires the authorization of the rightsholder concerned unless relevant copyright exceptions apply.”[8] These exceptions are specified in the next paragraph and refer to the EU Database Directive (as amended in 2019), which allows text and data mining under certain conditions. Where the training of AI is not for scientific research, rightsholders have the right to opt out of text and data mining if they expressly reserve this right in an appropriate manner. In this case, providers of AI models must obtain permission for text and data mining for the purpose of AI training.

Universal Music Publishing et al versus Anthropic

However, rights holders already see AI training as an infringement of copyright, at least in the US, where there is still no legal regulation of the training of AI systems. On 18 October, Universal Music Publishing (UMP), together with Concord Music Group and ABKCO, which represents the rights of The Rolling Stones, filed a copyright infringement lawsuit against AI company Anthropic in a Nashville district court.[9] Anthropic is the provider of the ‘Claude’ chatbot, a rival product to ChatGPT, and was founded by former Open AI employees in 2021. The AI company received US $4 billion in funding from Amazon in late September 2023 as part of a wide-ranging collaboration, and the online retailer bought a minority stake in Anthropic.[10] Anthropic has previously raised US $300 million from Google, $500 million from cryptocurrency fraudster Sam Bankman-Fried, as well as investments from video communications platform Zoom and US software company Salesforce.[11]

Like any chatbot, Claude collects text from the internet to train its AI language model. This is where the lawsuit filed by Universal Music Publishing and its co-partners comes in. They accuse Anthropic of creating an AI model based on huge amounts of text collected from the internet. This includes “(…) lyrics to innumerable musical compositions for which Publishers own or control the copyrights, among countless other copyrighted works harvested from the internet.”[12] The plaintiffs point out that Anthropic neither requested nor received permission from the copyright holders to use the copyrighted works. The lawsuit lists a total of 500 musical works to which the plaintiffs hold copyrights, such as “What a Wonderful World” by Louis Armstrong (Concord Music Group), “You Can’t Always Get What You Want” by the Rolling Stones (ABKCO) or “I Will Survive” by Gloria Gaynor (UMP).[13] When requesting ‘Claude’ for the lyrics of one of these songs, the AI delivers the almost identical lyrics of the requested song, which is a copyright infringement. The damages to be awarded under US copyright law would be $150,000 per infringement (statutory damages), which would amount to $75 million in damages for 500 infringing works.[14] However, the lawsuit goes beyond this simple copyright infringement and also accuses the chatbot of making plagiarised copies of their copyrighted works. The plaintiffs are trying to prove this by showing that when a request is made without specifically naming the work, author or performer, the AI spits out a copyrighted song lyric without specifying its origin. For example, if ‘Claude’ were asked to write a song about the death of Buddy Holly, the chatbot would generate the hit song ‘American Pie’ by Don McLean,[15] which, according to the plaintiffs, constitutes plagiarism and also alters the mandatory copyright management information. The plaintiffs claim damages of US $25,000 for each infringement.[16]

Upon request, Anthropic submitted a statement to the US Copyright Office clarifying that the data was copied during the training process, but only for the purpose of statistical data analysis. The reproduction process is merely an intermediate step in generating non-protected elements from the dataset, from which the new outputs are then derived. In Anthropic’s view, therefore, the copyrighted work is not reused to deliver it to the users of the AI. This means that no copies of copyrighted data are being made, but rather AI models are being created that have entirely new properties. This is covered by the fair use provisions of the US Copyright Act and, according to Anthropic, also complies with the legal safe harbour provisions in Singapore, Japan, Taiwan, Malaysia, Israel and the European Union.

However, Anthropic did not comment on data collection, which also involves a duplication process, as the data must first be transferred to a database. This is where the rights holders could start. The courts will ultimately decide which legal position prevails, but it is likely that the lawsuit against Anthropic will serve to force the company, and other AI providers such as Open AI, to the negotiating table to achieve a licensing agreement with the rights holders. This has been the solution whenever a new technology has disrupted the music industry – just think of recorded music, radio, MTV and now the various digital music formats. Media analyst Mark Mulligan also recommends that this is how music rights owners should deal with AI offerings: “So, the most scalable solution for music rightsholders will be to fix the problem at the top, by ensuring that generative AI tools only learn from what they have permission to learn from. (…) The alternative (trying to license and/or collect royalties on the millions, billions or trillions of songs that will be created) would be a fool’s errand.”[17] Michael Nash, Chief Digital Officer of Universal Music Group, made a similar statement during the company’s earnings call in April 2023, clarifying that “Companies have to obtain permission and execute a license to use copyrighted content for AI training or other purposes, and we’re committed to maintaining these legal principles.”[18]

Licensing Issues of Copyrighted Data for AI Training

However, the question arises how to implement such a licensing regime. Both the copyright in the musical work and the neighbouring rights to the music recording would be affected. Different licensing practices have emerged. For example, in the case of music streaming, the recordings’ master or neighbouring rights are licensed directly between rights holders and streaming platforms. In the case of copyrights in musical works, on the other hand, only collecting societies can license copyrights and collect royalties. In order to avoid later discussions on the distribution of these “AI royalties”, it should therefore be clarified in advance which licensing and collection model is to be used. In the case of a private-sector model based on the use of master rights by music streaming services, significant licensing revenues would flow to the record companies, which would be difficult to redistribute to artists because it would be almost impossible to generate usage information. This is because an AI processes thousands of individual pieces of data per second. As we have seen, Google DeepMind’s WaveNet uses 16,000 samples per second for a raw audio file.[19] There is also the problem that, with deep learning, it is no longer possible to trace which input information was processed by the AI to arrive at a particular result. Under such circumstances, pay-per-use remuneration is no longer possible. The solution would therefore be to negotiate a flat fee for the use of music recordings and musical works, as has already been done between social media platforms and music rights holders. However, there will be heated debates how to distribute the pots full of money.

Another option would be to create an exception in copyright law for the use of AI training data, as it is already the case in many jurisdictions for the private use of copyrighted works. This would also entail the obligation to pay a fee, which could be calculated on the basis of the amount of data required for AI training. Again, this would create an unallocated pot of royalty income that could be distributed by collecting societies according to transparent and comprehensible rules. Similar to the flat-rate levy in Germany, this could be used to create funds to distribute subsidies for social and cultural projects in the music sector.

The above models will be favoured or rejected depending on the point of view and interests involved. Therefore a broad socio-political discourse is needed on how to deal with any ‘AI royalties’. In any case, the end result should be a political decision that balances all interests and represents a good compromise solution for all parties involved. Leaving it solely to the lobbying activities of tech and entertainment companies to find a solution for the use of AI training data would be the absolute worst approach.

The Training Process of AI and Copyright

Now we come to the training phase of AI. We have seen that the new AI systems in particular, the Reinforced Neural Networks (RNN) and Convolutional Neural Networks (CNN), no longer access the original database during training, but parameterise the data in order to map it in a highly abstract form in an AI model. This means that no new database is created and no duplication process takes place, which means that the existing copyright no longer applies. Therefore, no copies of the original data are made for the processing phase; instead, the AI accesses the parameters it has created itself. This can involve millions or even billions of parameters, which are further processed in the hidden layers. Nevertheless, the German Copyright Initiative (IU) argues in its position paper of September 2023 that there is much to suggest that the trained AI model still contains reproductions in the copyright sense. It is possible for systems such as ChatGPT to reproduce poems or other copyrighted texts, the argue.[20] This argument is in line with the lawsuit filed by Universal Music Publishing and other music publishers against Anthropic, which also refers to the AI’s ability to reproduce almost identical lyrics of hit songs on request. However, it should be noted that the AI does not reproduce a copy of the copyrighted material, but rather calculates a specific result based on probabilities. This can change as a result of additional training data or learning processes taking place within the AI. It is therefore not clear whether the trained AI models are actually making copies. In this respect, a new category of copyright or a new type of use may be needed to take account of what happens in the processing phase of the AI. It should also be borne in mind that once AI models have been created, they can of course be used as training input by other AIs. There is no longer a need to collect primary data in order to create new AI models, which would make the data reproduction argument meaningless. What happens to AI models once they have been created is also not yet legally clarified. Can they be reused at all? Would they even have to be deleted if they were created in violation of copyright law? Or could neighbouring rights be granted to the AI models themselves? As you can see, there are still many unanswered legal questions regarding the training of artificial intelligence.

Endnotes

[1] §15 of the German Copyright Act grants the author the exclusive right to reproduce his work in physical form (§16), to distribute it (§17) and to exhibit it (§18). Gesetz über Urheberrecht und verwandte Schutzrechte (Urheberrechtsgesetz) of 9 September 1965 (BGBl. I p 1273).

[2] Directive 96/9/EG of the European Parlament and of the Council of 11 March 1996 on the legal protection of databases amended by the Directive (EU) 2019/790 of the European Parlament and of the Council of 17 April 2019, article 1, sec. 2.

[3] Ibid. article 10 “Term of protection”

[4] Ibid. article 7, sec. 2a-b.

[5] § 87a-e Gesetz über Urheberrecht und verwandte Schutzrechte (Urheberrechtsgesetz) of 9 September 1965 (BGBl. I p 1273).

[6] Ibid. § 44b “Text und Data Mining”.

[7] Ibid. § 44b sec. 2.

[8] EU AI Act, “Compromise proposal on general purpose AI models/general purpose AI systems”, December 9, 2023, accessed: 2024-05-06.

[9] Music Business Worldwide, “AI company Anthropic recently secured up to $4bn in investment from Amazon. Now it’s being sued for copyright infringement by Universal Music Group”, October 18, 2023, accessed: 2024-05-06.

[10] Anthropic Pressemitteilung, “Expanding access to safer AI with Amazon”, September 25, 2023, accessed: 2024-05-06.

[11] Music Business Worldwide, “Blatant Plagiarism? 5 key takeaways from Universal’s lyrics lawsuit against AI unicorn Anthropic”, October 23, 2023, accessed: 2024-05-06.

[12] Concord Music Group, Inc. v. Anthropic PBC, Case 3:23-cv-01092, Complaint and Demand for Jury Trial in the United States District Court for the Middle District of Tennessee, Nashville Division, October 18, 2023.

[13] Ibid. appendix.

[14] Ibid., p 58.

[15] Ibid., pp 29-30.

[16] Ibid., p 58.

[17] Mark Mulligan, “AI will transform music; the question is how?”, Media Research Blog, April 18, 2023, accessed: 2024-05-06.

[18] Music Business Worldwide, “Universal Music Group: Yes, ripping off Drake’s voice for that AI track was illegal – and we’re certain of it”, April 27, 2023, accessed: 2024-05-06.

[19] Aaron van den Oord et al., 2016, “WaveNet: A generative model for raw audio”, arXiv:1609.03499 [Cs], p 1.

[20] The German Copyright Initiative (IU) is a platform of 44 professional organisations and trade unions representing 140,000 authors and artists. The text refers to the positions paper “Generative KI: Urheberrechtlicher Status quo & Handlungsempfehlungen”, which was published on 19 September 2023.

Music Business Research

AI in the Music Industry – Part 14: AI and Copyright Infringement

AI in the Music Industry – Part 14: AI and Copyright Infringement

Leave a comment Cancel reply

AI in the Music Industry – Part 14: AI and Copyright Infringement

Share this:

Related

Leave a comment Cancel reply