Read time: 5 minutes
Introduction
Given the uncertainty over many of the issues presented in this Legal Issues of AI in the Entertainment and Media Sector section, recently there has been a significant increase in litigation relating to AI licensing. Outside of regular intellectual property considerations relating to whether data used in training models violates copyright, privacy, or similar rights restrictions, in many other contexts, training of generative AI, machine learning, and other types of data-centric development is becoming increasingly an issue in many transactions.
Below we summarize a few of the ongoing cases that deal with licensing issues and present emerging trends in this matter.
Failed license cases
- CoPilot Case- On Nov. 2, 2022, a class action was filed on behalf of software developers against GitHub Inc., Microsoft Corp. and the OpenAI entities alleging violation of the Digital Millennium Copyright Act and breach of contract of the open-source licenses governing the source code due to the release of GitHub CoPilot. CoPilot is a generative AI built by GitHub to assist programmers while they are coding within the platform. In this case, the underlying work consists of source code created by other, non-GitHub or Microsoft developers for a public library on GitHub under an open-source license. One of the requirements these developers established for their open-source license was that copyright notices within their source code must be reproduced when it is used as the basis of derivative software or code.
Generally, original source code may be used and distributed to third parties as long as proper recognition is provided. The plaintiffs allege that the CoPilot AI instead removes or alters such copyright information from the source code and then reproduces the source code, without the requisite copyright information, to the CoPilot users. In response, GitHub and Microsoft have argued that CoPilot does not need to reproduce the copyright information as CoPilot is not built around the code in plaintiffs’ open-source library but is based on all code developed and stored within GitHub. GitHub recently got the judge presiding over this case to dismiss most of the claims, including the copyright infringement claim, but with leave to amend and re-submit to the court. - On Jan. 13, 2023, Sarah Andersen, Karla Ortiz, and Kelly McKernan filed a class action against Stability AI Ltd., Stability AI Inc., Midjourney Inc., and DeviantArt Inc. alleging infringement of certain copyrighted images of the plaintiffs’ artwork, as well as breach of contract, unfair competition, and violation of their right of publicity. The named plaintiffs are artists who claim that Stability AI used their artwork in training Stability AI’s algorithms without consent.
Stability AI has responded by arguing that they should only face claims for copying works that are registered, something the named plaintiffs did not do prior to filing the suit. Further, defendants claim none of their produced output images contain substantial similarities to any copyrighted works, thus they could not be infringing on the existing copyrights. They claim a lack of pleading direct infringement prevents any claim under the DMCA or of vicarious infringement. Finally, they argue the unfair competition and right of publicity claims are preempted by the copyright claims, and thus should be dismissed. - On Feb. 6, 2023, Getty Images also sued Stability AI for copyright infringement, as well as for trademark infringement. Getty Images is a leading creator and distributor of digital content, primarily photographic images. The images are either created by staff or hired photographers, or acquired from third parties, and the applicable copyrights assigned or licensed to Getty Images. Getty Images alleges the content scraped from its websites was collected without consent and that certain images produced by Stability AI's software contain a modified version of the signature Getty watermark, causing numerous concerns under trademark law as to Getty’s association with Stability AI and dilution of Getty’s trademark protection. Further, Getty has previously licensed their content to other companies, including those that have used Getty’s content to train generative AI models, like Stability AI. While that does not mean Stability AI’s use was in violation of copyright and trademark law, it does limit potential arguments they could make surrounding claims of fair use. Stability AI has yet to make an argument on the merits of the case, but they are likely going to mirror their arguments in the class action regarding not producing output images containing substantial similarities to any of the copyrighted works.
Emerging trends
As businesses build and develop products, especially using freemium or give-to-get models, where part of the value proposition for the product is a reduced price or access to additional features and functionality continues to grow. Providers of such services often receive a broader license than a business might otherwise get in an arms-length transaction. Companies are and will be likely to clamp down on access to and use of some tools or may struggle with broad employee use outside of policy as they have with many other nascent services. For example, in social media many commentators have suggested that companies not put sensitive or proprietary information as part of prompts or otherwise in seeking to use various large language model or similar tools. For many years companies have made products available on a “give-to-get” basis whereby dashboards, analytics and other types of tools and value are built or made available predicated on the economic network effect which is generated when the community as a whole benefits from increased usage by many. Larger enterprises have, often, sought to use their particular market power or leverage to obtain the benefits of such effects while restricting or limiting the use of any data they provide, or data generated about their usage, is used, or incorporated into models, machine-learning and beyond. Hype surrounding artificial intelligence is driving greater conflict as more organizations awaken to the risk (and rewards) of data for deriving insight and analysis with artificial intelligence model development accelerating this trend. Many have projected that having large and less-encumbered data lakes can and will provide a competitive advantage for some players. However, recent developments in open-source model development suggest such advantages may be short lived. Wherever trends go, data continues to emerge as one of the most important asset classes of the twenty-first century.
- Data ownership in AI is becoming contentious, with businesses seeking to balance access to tools and data with concerns about proprietary information