Read time: 5 minutes
Introduction
Copyright is a territorial beast and not all countries are equal in how they have decided to approach the TDM debate.
The U.S. apprehends TDM through its doctrine of “fair use,” that permits limited use of copyright protected material without having to first acquire permission from the copyright holder – in particular where the contemplated use is deemed “transformative.”
Copying copyright protected content for TDM purposes
In the United States, the reproduction right is reserved for the copyright owner of a work or its licensees under section 106 of the U.S. Copyright Act of 1976. While there are no express exceptions in U.S. copyright law, section 107 of the Copyright Act authorizes the fair use of a copyright protected work, “including by reproduction in copies or phonorecord or by any other means specified by that section, for purposes such as criticism, comment, news reporting, teaching […], scholarship, or research.” Copying copyright protected works for the sole purpose of text and data mining has traditionally been considered a case of fair use by the technology sector. The creative sector disagrees, and the launch of generative AI solutions capable of producing photos, paintings and music at the push of a button has seen copyright holders rally behind the “unfair use” banner to condemn the use of their content by AI businesses.
What is fair use?
To determine whether the use of a copyright protected work without the consent of the copyright owner constitutes non-infringing fair use, courts will balance the following four factors on a case-by-case, highly fact-specific inquiry basis:
- The purpose and character of the use, including whether the use is of a commercial nature or is for non-profit educational purposes;
- The nature of the copyright protected work;
- The amount and substantiality of the portion used in relation to the copyright protected work as a whole;
- The effect of the use upon the potential market for or value of the copyright protected work.
The first factor. The first factor, also known as the “transformative use factor,” is generally the most heavily weighted by the courts. A use is transformative, if it merely supersedes the existing work, or, to the contrary, if it adds something new, with a further purpose or different character, altering the first work with new expression, meaning or message1. Even if a work is copied and stored in substantially the same form as the original without meaningful alteration, it does not preclude the use from being considered transformative in nature, so long as the use by the would-be copier serves a materially different function than the original work2.
Some examples where courts have found a use to be transformative include making digital copies of student papers to use an anti-plagiarism software (where the defendant’s use of the works was unrelated to such works’ expressive content),3 or scanning books to create a full-text searchable database and public search function (in a manner that did not allow users to read the texts).4 While educational and non-commercial uses are generally more likely to be decided to be fair use, courts will not necessarily find a commercial use to be unfair and will instead balance the purpose and character of the use against other factors.
Copies of original works made for TDM purposes appear to have a purely functional purpose, namely, to teach an AI model about the underlying characteristics of a work through pattern recognition. Copies of original works made for TDM purposes are never released or made available to the public, hence it would appear that their transformative nature is on par with existing case law.
- In the U.S., in the absence of a TDM exception, AI companies contend that inclusion of copyrighted materials in training sets constitute fair use - e.g., not copyright infringement, which position remains to be evaluated by the courts
- The legality of data mining can depend on various factors, including the type of data being mined and the purpose for which the data is being collected and used
 
            