AI – development soon only with synthetic data?

28. June 2022 0 By Horst Buchwald

San Francisco, 6/28/2022

From an analysis by Bloomberg, it appears that the majority of AI developers use fake or “synthetic” data to train AI systems and avoid possible biases.

The data have the same statistical properties as real data. They are preferentially used when the required data is too expensive or rarely occurs. Even if they do not exist or there is no access to them. Developers use synthetic ones.

One example is Simi Lindgren’s website Yuty, which analyzes selfies to recommend skin care products. Lindgren wanted to train an AI system on facial images, but she lacked enough photos of dark-skinned women. Instead, Lindgren turned to General Adversarial Networks (GANs) to create hundreds of thousands of photorealistic images of people with different skin tones.

With this in mind, Gartner predicts: by 2024, about 60% of the data used for AI and analytics projects could be synthetically generated. Starting in 2030, fake data would overtake real data in AI training.

According to StartUs Insights, more than 50 startups now produce synthetic data as a service. Companies in this space include Mostly AI, which generates synthetic data using algorithms trained on a company’s real data, and Datagen Technologies, a platform that generates synthetic data for computer vision systems.

