Is artificial intelligence threatening copyright?

Midjourney's generative artificial intelligence went live in beta on July 12, 2022. To date, one of his creations has won the first prize in an art competition [1], a manga entirely designed by his artificial intelligence has been published in Japan [2], the V5 of his algorithm has been at the heart of a scandal of” Fake news ” involving a photo of the Pope in a white jacket and photos of the President of the Republic collecting trash cans [3], which had the effect of pushing its founder David Holz, to eliminate free access to the service [4].
In addition to questions relating to the protection of content generated by artificial intelligence by intellectual property law, which are dealt with in a previously published article [5], what about the protection of content used by artificial intelligences to train their algorithms and create graphic content?
1) Is the training of artificial intelligences on protected content constituting a violation of intellectual property rights?
The data used to train artificial intelligence algorithms may be subject to various types of intellectual property protection such as copyright, patents, trademarks or trade secrets. In addition, as we saw recently, this data may also be protected by legislation relating to the protection of personal data [6].
In particular, with respect to copyright, the use of protected works for the training of artificial intelligence algorithms may be considered a copyright violation if it is carried out without the prior authorization of the rights holders. However, whether such use is a violation of intellectual property rights depends on the specific circumstances of each case, but also on the country in which one is located.
In France, the law states that any reproduction, use or adaptation of a work without the authorization of the rights holder is likely to constitute the crime of counterfeiting. Moreover, case law adds that the assessment of counterfeit works is based on the similarities between the protected work and the infringing work and not in light of the differences between the two. Within the European Union, each Member State has its own copyright legislation, so the assessment of counterfeiting is different from country to country. In Anglo-Saxon countries, copyright is applicable and its logic is different from that of copyright since it aims to protect the commercial interests of authors first and foremost, while copyright includes the protection of authors' moral rights.
Several actions in violation of copyright by generative artificial intelligence applications have been initiated in recent months. A copyright infringement class action against Stability AI, Midjourney, and DeviantArt is currently being considered by the Northern District Court of California [7]. The Getty Image image bank, for its part, sued Stable Diffusion for having trained its artificial intelligence on its image bank, with more than 12 million works, without authorization or financial compensation for the authors [8].
Also in the United States, Apple recently had to stop training its artificial intelligence to read audiobooks using the content of human audiobook readers, after the opposition of their unions to such a practice [9].
The issue of copyright and copyright infringement by generative artificial intelligences is therefore complex and determined by applicable law. We are eagerly awaiting the results of the actions that have been brought before the courts across the Atlantic and there is no doubt that we will also soon have a French example to follow.
2) Anticipating legislative changes: for or against artificial intelligence?
The regulation of the development and use of artificial intelligence is ongoing and will continue in the years to come. In Europe, in April 2021, the European Commission published a proposal for a regulation on artificial intelligence, the final adoption of which is expected in 2023. Once adopted, the text will be immediately applicable within the European Union. This proposal aims to protect fundamental rights while encouraging the development and innovation of artificial intelligence in Europe, by setting strict requirements for the transparency and accountability of artificial intelligence systems.
This proposal for a regulation is part of a global approach to the development and use of artificial intelligence and not into a sectoral approach. Therefore, the text does not directly concern the case of generative artificial intelligences and the content on which they train.
However, part of the text is interesting in view of the case in question: the exception of Data mining for the benefit of artificial intelligences. This exception would allow artificial intelligence developers to use data sets protected by copyright or other intellectual property rights for research or innovation purposes. It is specified that this exception would only apply if the data is not used for commercial purposes and if developers take reasonable steps to prevent unauthorized access or copying of the data.
The exception of Data mining is similar except for Text and Data mining expressed in the European Copyright Directive, adopted in 2019 and which aims to harmonize the European Union's copyright rules. These two European texts are part of a logic of encouraging the development of artificial intelligence through its training on pre-existing content, while maintaining the economic interests of the creators of pre-existing content through the implementation of measures to restrict access. It therefore appears that the protection of copyright and the copyright of the content on which artificial intelligences are trained could be exclusively implemented through the deployment of technical solutions integrated into artificial intelligence applications and whose role will be to restrict access to content [10].
Across the Atlantic, Congress examined several bills on artificial intelligence, but like the European Union, none dealt specifically with the fate of the content used to train generative artificial intelligence algorithms.
Pending the development of legislation on the specific case of generative artificial intelligence and intellectual property, it is reasonable to believe that case law will play an essential role in the development of a doctrine on the subject, as was the case with the issue of intellectual property protection of creations generated by artificial intelligence [11].
3) Is it currently possible to protect your creations against being processed by artificial intelligence?
In the current absence of sectoral regulation and the adoption of case law on the fate of content used to train artificial intelligence algorithms, technical solutions are being put in place, both by the developers of artificial intelligence and by the holders of intellectual property rights.
On the side of developers of generative artificial intelligence, first of all, the technique of” The opt out ” has been implemented in particular on DeviantArt [12]. This technique allows authors who do not want artificial intelligence to have access to their content in order to train its algorithm, to make it known and to remove their content from the database accessible to artificial intelligence. For its part, OpenAI and Meta have established a partnership with Shutterstock allowing them to train their artificial intelligences thanks to the image bank. In return, Shutterstock users benefit from direct access to the Dall-E generative artificial intelligence program [13].
On the side of intellectual property rights holders or their representatives, content protection strategies are also implemented. For example, unlike Shutterstock, other image banks are more cautious about images generated by artificial intelligence, such as Getty Images, which has prohibited the upload and sale of such images, or Adobe Stock, which allows the sale of such images, or Adobe Stock, which allows them to be sold as long as it is clearly specified.
Applications and websites have specifically been created in order to allow owners of intellectual property rights to protect their creations from the web scrapping carried out by artificial intelligences. The website Have I been trained? [14] suggests that Internet users search the Laion-5B public database, which is used to train Stable Diffusion generative artificial intelligence in particular, in order to find out if their images were used to train artificial intelligence or not.
In the same spirit, the Glaze application offers the installation of a” veneer ” digital on creations, in order to disrupt their reading by generative artificial intelligences who would try to use them [15]. According to its designers, this application is an emergency solution that will certainly be countered by generative artificial intelligence applications quickly, but for the time being offers a satisfactory solution against the non-consensual use of content by artificial intelligences.
Link to the article published in Village Justice magazine.
[9]https://www.wired.com/story/apple-spotify-audiobook-narrators-ai-contract/
[10] See next paragraph for the first examples of these measures.
[11]https://www.copyright.gov/docs/zarya-of-the-dawn.pdf
[14]https://haveibeentrained.com/