Entertainment and Media Guide to AI

AI in entertainment & media part 2 icon - headphones icon

Read time: 2 minutes

Does the videogame sector hold the key to the AI training problem?

Obtaining high-quality, accurate training data is the goal of any data scientist developing AI systems. Yet, obtaining enough quality data to train and optimize a model is fraught with difficulties and complexities. Training sets of high quality are difficult to find and often comprise a wide range of assets which require extensive operations of data “cleaning” and data “normalizing” before they can be used for training. Furthermore, the presence in these training sets of personal data or copyright-protected material, can render their use illegal, or at least uncertain, in most jurisdictions.

Auteurs: Sophie Goossens

The games industry may hold the key to unlocking this issue. The solution? Synthetic data.

Synthetic data or AI-generated data is a process whereby one replaces real-life data obtained from the field, with “manufactured” data, generated by an AI system. It can be used to replace collected data by preserving or mimicking its properties or to supplement collected data to improve its completeness or to enhance privacy protections. For example, it can be a powerful tool for generating synthetic medical imaging or self-driving car scenarios. In addition, developers can use synthetic data to add more diversity to their training models and help remove biases that can often be found in real-world data sets.

From a legal perspective, synthetic data can be an interesting alternative to real-life data especially with respect to the following issues:

  • When training sets include data protected by copyright, since data generated by an AI are regarded as public domain in most legal systems;
  • When accessing the “real-life” data set requires negotiating time-consuming, technical, and costly data sharing agreements;
  • When the real-life data sets are sensitive and contain personal information about individuals, subjecting their processing, storage and transfer to strict compliance requirements; and
  • When training sets are known to be biased or incomplete.

Synthetic data is a booming market and a phenomenal opportunity for the games sector to service the entire AI industry. Game engine tools can bring any type of datasets to life in digital or video format and render thousands of images for AI systems to detect or analyze including human faces, existing of future product packaging, landscapes, heritage sites, or sounds. In a world where the use of “proprietary” data by a technology company is increasingly being challenged, synthetic data offers a formidable opportunity to increase legal certainty. Synthetic data may not be capable of replacing real-world data in every instance, but it can certainly contribute to propelling the value of the game sector to new heights.

Key takeaways
  • Obtaining high-quality, accurate training data is the goal of any data scientist developing AI systems and the games industry may hold the key to unlocking this issue
  • The solution? Synthetic data
  • Synthetic data or AI-generated data is a process whereby one replaces real-life data obtained from the field, with ‘manufactured’ data, generated by an AI system