Can data be owned?
Data is free flowing information. There is no standard definition of the term “data.” The Joint Technical Committee of the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC) proposes the following definition of the term:
“Reinterpretable representation of information in a formalized manner suitable for communication, interpretation, or processing.”
At the most basic level, data is just information. For example, the fact that a texture belongs to a genre called “brick” or “steel” is information that is not capable of appropriation in itself. This lack of ownership of data stems from the fundamental principle that information, ideas, methods and techniques are free and free flowing.
Data is information. Whether or not any person has a proprietary or “ownership” interest in data rests on the question whether the law has created a specific property regime for that type of data, also called “intellectual property.” In most countries, there exist only five types of data susceptible of being protected by intellectual property: (i) works of art and other subject-matters from the creative industries; (ii) databases; (iii) software; (iv) trademarks and (v) patentable inventions. Simply put, data which does not fall within one of the aforementioned categories may not be “owned.” Of course, this does not mean that one is entirely free to use and re-use data which is not “intellectual property,” since other type of restrictions might apply to the data, as discussed below.
Overview of property regimes applicable to data
Ownership of or right to control input data
Parties providing data intentionally for usage in machine-learning or AI development frequently seek to assert ownership or control of the data or otherwise assert a right to exclusively share and use data. However, ownership, in the sense of property is often not available with personal information being a good example. Personal information is not a proprietary right, it is an access right which is almost exclusively controlled by the individual who the information relates to or identifies, and it is difficult for a party other than that individual to assert rights over that specific data. For other types of data, such as confidential business information, a customer may want to ensure that it maintains explicit confidentiality rights in its data and that no rights are transferred to the vendor by virtue of the performance of the services for the customer. For a vendor, there may be significant value in controlling the input data so that it may continue to use such input data in its AI tool without breaching another party’s rights. Many vendors provide services freely or cheaply in order to generate input data that can be used to train and improve their models.
Ownership of or control of output data
The ownership status of output data is the most highly contested and difficult provision to negotiate in data-related contracts. The output data of an AI model may include direct end-user output data created for use by the AI customer and indirect “output” data that is inputted by the customer and used by the model to improve functionality and efficiency. Output data varies depending on what type of model is used and what its purpose is. There are three main types of outputs: (a) a prediction; (b) a recommendation; or (c) a classification. Many customers desire to “own” the output data since the output data was created using the input data provided by the customers. A customer could argue that output data was a derivative work (as such term is used under the U.S. Copyright Act) and therefore, ownership automatically flows through to the customer. Unfortunately, data which is used in artificial intelligence development or model training may often be of uncertain copyright provenance or unequivocally not subject to copyright protection. Even trade secret status is frequently unclear. However, sometimes vendors will argue that it is important that they “own” the output data they created since they used their own proprietary model to create the output data. The AI vendor may also want to ensure it keeps “ownership” of the output data so that it can continue to use that data to train the AI model. In practice, output data are rarely susceptible of appropriation hence relying on contractual terms delivers far better certainty. There is little case law on who can claim rights to output data so the parties may want to review the contractual language to ensure that each of their interests is protected when negotiating these types of contracts.
Use of derived data
Derived data is new data and insights derived from the output data and may not have been available from the existing data. Since such derived data is valuable, customers and vendors both may have potential use for such data outside of the contractual agreement for the AI model. One of the issues with derived data is who can control it. A customer could argue that it should be afforded the right to control the Derived Data since it was the original inputter of the data that is used to create the derived data but since derived data is created by the act of combining and transforming data into a new type of data, a vendor could argue against it. As discussed in the Failed License Cases and Emerging Trends section, vendors can monetize the access and use of the derived data through either a license to a database containing such derived data or the purchase of certain derived data from customers.