
What is Fair Use? Fair Use is a doctrine in the United States that describes the circumstances under which copyrighted works may be used without seeking the permission of the creator or rights holder. Fair Use exists in order to advance certain forms of work and communication that are judged to have an overriding benefit to society without causing significant harm to the copyright holder - such as the practice of journalism, teaching, and conducting research.
Fair use relies on an analysis of four factors:
Fair use is also characterized by how "transformative" the use is determined to be. Simply reproducing another work within the context of your own is generally not considered to be a good basis for a Fair Use argument, while works that are substantially remixed, commented on, analyzed or parodied are more likely to be considered Fair Use.
Source: U.S. Copyright Office Fair Use Index
Fair use is relevant in conversations about artificial intelligence because it has been posited, especially by AI companies, that their integration of copyrighted works into datasets used to train AI tools falls under the umbrella of Fair Use. These companies argue that there is legal precedent to support the use of these works to train or "educate" the AI models and that they are radically transformed through their integration into the overall corpus of training data, which allows for pattern recognition and networking relationships between aspects of the works and metadata.
Two of the major legal precedents were established in Authors Guild v. HathiTrust (2014) and Authors Guild v. Google (2015), which held that mass digitization of a large volume of in-copyright books in order to distill and reveal new information about the books was a fair use. The applications developed by HathiTrust and Google allowed users to browse content by linked metadata (such as discovering books by similar subjects, authors, and publishers) and to explore and analyze text (such as graphing keyword density within a given text or the frequency of certain words in a corpus over time). While these cases did not concern generative AI, they did involve machine learning and parallel the arguments used by AI companies to describe the transformation of works into data and then into tools and applications that have a distinctly different use than their original purpose as creative works.
However, despite these precedents and the arguable transformativeness of the of works in this context, there are others who argue that Fair Use cannot be applied to the ingesting of these works into AI tools and training datasets. They argue that other four factors are not met - especially the increasingly commercial nature of AI platforms and applications, and the potential for market harm. One of the key legal cases in this space is that of the New York Times against OpenAI, the company that owns ChatGPT. They allege that by ingesting New York Times' content into their data, potential subscribers to the newspaper might simply query the chat bot to receive news and analysis generated by their journalists, rather than reading that content through the publisher's website or paper. This would, in turn, harm the Times' business, despite the fact that the ChatGPT users are not simply reading verbatim versions of the NYT articles.