Whatever the metaverse is – whether an augmentation of the real world, any number of artificial virtual worlds, or both – it is certain that it will be characterized by an overlay of unfathomably vast amounts of information or “data.” A feature of that information is that it will be created and distributed from within the metaverse itself, that is, from within an environment created and imagined by a person and controlled by a particular entity (for example, the developer of a game, and increasingly any other business wanting to be present in the metaverse). But the metaverse, unlike the real world, is entirely manufactured. There will be no digital tree or cloud in the metaverse that doesn’t “belong” to its creator. From the look of our avatars, to the clothes we wear and the cars we drive in the metaverse, we can expect that almost everything will be somebody’s intellectual property.
AI uses machine learning technologies to review, digest, and analyze vast quantities of data to create rules of application called algorithms. Once “educated,” machine learning software can continually improve itself through the analysis of new data sources and through the observation of its own data output. More recently, AI has expanded to include computing systems that aim to replicate the function of the human brain in analyzing and processing information, called artificial neural networks, as well as pairing computer networks in generative adversarial networks where the computers learn from each other.
The massive ingestion of data by AI machines, and the works they create, have generated considerable debate. Can AI digest massive databases that include copyrighted works and use machine learning to “author” creative works without infringing on copyright? In addition, is the output generated by AI protectable under copyright?
Machine learning and fair use
As AI search engines crawl through the worldwide web endlessly seeking, digesting, and aggregating content, they inevitably digest copyrighted works such as music videos, songs, novels, and news stories. Since this digestion is frequently performed without the consent of the copyright holder, its legality depends on whether it is a permitted exception to, or outside the framework of, copyright law. Under U.S. copyright law, the exception that is most frequently relied on is “fair use.”
Under section 107 of the Copyright Act, “fair use” is a four-factor test: (1) the purpose of and character of the use; (2) the nature of the copyrighted work; (3) the amount and substantiality of the portion used in relation to the whole; and (4) the effect of the use on the potential market for, or value of, the copyrighted work. Fair use of a copyrighted work for such things as teaching, scholarship, and research is specifically permitted by section 107. A key consideration that courts have used in deciding whether fair use exists is whether the use is “transformative.”
Whether machine learning of copyrighted material constitutes fair use is a hotly debated topic that will affect the future of AI. For example, Thomson Reuters and West Publishing Corp. recently sued Ross Intelligence, Inc. over, among other things, its alleged use of machine learning to create a legal research platform for Ross from the Westlaw database. Will fair use protect machine learning?
The Second Circuit found Google Books’ scanning of more than 20 million books, many of which were subject to copyright, to be a “non-expressive” and transformative fair use of the texts because Google Books enabled users to find information about copyrighted books, as opposed to the expressions contained in the books themselves. If the use of the copyrighted materials is “non-expressive” fair use, protection is likely available. As long as the AI used in machine learning is not “too smart,” the mechanical digestion of copyrighted works may be permitted.
Of course, AI has evolved far beyond Google Books. AI now has the ability to learn from the way authors express ideas and to generate its own creative output. This expressive machine learning may in turn harm the market for works by human authors. The fact that AI can create outputs that mimic human expression and personalization means that AI’s use of copyrighted works for purposes of machine learning may result in copyright infringement if permission has not been obtained from the owners of those works.