Entertainment and Media Guide to AI

Legal issues in AI part 2 - Gavel icon

Read time: 26 minutes

The use, ownership and exploitation of data is extremely valuable. The era of AI has ushered in a veritable gold rush of companies and individuals seeking to mine this man-made resource, which, unlike gold, is available in great abundance. However, the alchemy involved in turning a seemingly infinite into something valuable requires tremendous computational power and investment.

Text and Data Mining (TDM) generally involves the identification of patterns or relationships in data sets that were previously unknown. TDM can be used to build predictive models of behavior in the retail context, so that when a customer Amazon, or opens their Facebook page, they are presented with advertising keyed to their individual tastes and preferences.

In the media and entertainment context, one form of TDM, machine-learning, is being used to train AI programs to create content, whether in text, audio, visual or audiovisual form. Machine learning, like traditional TDM, is intended to discover novel and useful knowledge in data. However, a fundamental difference between machine learning and traditional TDM, is that TDM in and of itself, can extract data for human comprehension, whereas machine learning extracts data to improve an AI program’s own understanding and ability to produce output. In addition, TDM does not necessarily involve rule or pattern discovery, while machine learning almost always does.

TDM in the U.S.: What is ‘fair use’ anyway?

As discussed in the Geopolitics of AI section, the legality of making copies of the text or data through TDM has become a serious issue. As AI search engines crawl through the world wide web endlessly seeking, digesting, and aggregating content, they inevitably digest copyrighted works such as music videos, songs, novels, and news stories. Since this digestion – which generally requires the making of a copy – is frequently performed without the express consent of the copyright holder, its legality often depends on whether it is permitted under an exception to, or outside the framework of, copyright law. Under U.S. copyright law, the exception that is most frequently relied upon is fair use.

Under section 107 of the Copyright Act, fair use is a four-factor test: (1) the purpose of and character of the use; (2) the nature of the copyrighted work; (3) the amount and substantiality of the portion used in relation to the whole; and (4) the effect of the use on the potential market for, or value of, the copyrighted work. Fair use of a copyrighted work for such things as teaching, scholarship, and research is specifically permitted by section 107. A key consideration that courts have used in deciding whether fair use exists is whether the use is “transformative.”

Whether copying of copyrighted material for the purpose of machine learning constitutes fair use is a hotly debated topic that will affect the future of AI in the United States. For example, Thomson Reuters and West Publishing Corp. have sued Ross Intelligence, Inc. over, among other things, its alleged use of machine learning to create a legal research platform for Ross from the Westlaw database. The outcome of this case is still pending, and Ross’ motion to dismiss the copyright infringement and was denied.1

Will fair use protect machine learning?

In a seminal case from 2015, the Second Circuit found Google Books’ scanning of more than 20 million books, many of which were subject to copyright, to be a non-expressive and transformative fair use of the texts because Google Books enabled users to find information about copyrighted books, as opposed to the expressions contained in the books themselves.2  A key learning from the case was the distinction made between “expressive” and “non-expressive” use of copyrighted materials, the latter being deemed fair use by the court. Applied to AI, could the solution mean that so long as the original text does not “express” in the final work product, the act of machine reading is fair use?

We are not aware of U.S. courts applying fair use in the context of TDM, in part because cases considering AI functionality have often involved the express use of copyrighted material that qualified as traditional copyright infringement. For example, the Second Circuit found in a 2018 case, that although TVEyes’ “search feature” for Fox News content in and of itself might have been sufficiently transformative to be fair use, the fact that TVEyes also had a “watch feature” that redistributed copyrighted Fox News content to TVEyes users for a monthly fee did not permit a fair use defense (Fox News Network, LLC v. TVEyes, Inc., No. 15-3885 (Feb. 27, 2018)).

In practice, major TDM search projects are generally dealt with under contract, which has resulted in low instances of litigation. Academic and commercial arguments have also been raised against over-reliance on fair use for TDM. As a practical matter, a key factor that U.S. courts will look at is whether TDM deprives the copyright owner of the value of their copyrighted material.

AI licensing

The predominant way that rights to collect, use and share data are allocated in advance or in order to create business certainty is typically through licenses. A license is a right or a permission for a person or company to use another party’s intellectual property in exchange for a fee. The benefit of the licensing model is that it offers tremendous flexibility to slice, dice, allocate, monetize, expand and limit collection, use and disclosure in an area where often more traditional intellectual property rights of patent, copyright, trademark and trade secret law may be less clear or where there may be comparative differences of opinion or points of view and licensing can help address these issues among and between businesses and even consumers. In particular, licensing as a tool has broadly enabled many of the data-focused innovations of the Internet age. Licensing also helps to address privacy and data protection issues in many legal systems, for example, in the U.S. not only do privacy policies often address these issues, but terms of use or terms of service frequently include license grants that grant licenses to things that may or may not be subject to traditional intellectual property grants.

In addition, licensing can also be used to help address issues of confidentiality, usage considerations or limitations and, increasingly, learning and other issues which often may be experiential and machine-aided in connection with the collection, use and disclosure of data. For example, secondary usage or derivative usage of data, which may not be subject to copyright or trade secret protection is increasingly addressed by contract. Similarly, residuals which refer to information in nontangible form, which may be remembered by persons with access to confidential information are something increasingly important for parties to consider when exchanging confidential information with other parties. Not only can the information generated by a business relationship be valuable but who has a right to secrecy with respect to it and whether and how the counter-party can use it has become of such great importance that the entire enterprise value of certain businesses has been written off when rights in underlying data were questioned and more recently acquisitions transactions have had their purchase price changed or deals fail to close because of uncertainty about data rights.

With this in mind, it is helpful to understand common contractual provisions used in licensing relating to the collection, use and disclosure of data.

Key takeaways
  • Most AI systems are trained by analyzing and extracting information from vast quantities of data
  • Copyrighted material is making its way into AI products, potentially changing the way that data is licensed
  • Purchasers of AI systems should consider clauses in contracts to protect themselves from new AI risks