For data startups, getting their hands on high-quality datasets on which to train their algorithms can be very difficult, not to mention expensive. The alternative has been to use synthetic data generated by computers to mimic real data to train AI applications. While not as good as real data, synthetic data can approach 70% effectiveness (according to an MIT study) and can get even better in the future.
The use case for synthetic data is explained further in Nanalyze:
…why use synthetic data over real data? One of the top reasons is what has become a tech buzzword in recent years: democratization. The experts say that startups trying to get out of the gate are at a disadvantage against the data-rich Google giants of the world because they don’t have the resources to build the big real-world datasets required to train algorithms. Synthetic data also eliminates the privacy problems that might hamstring machine learning applications in areas like healthcare. The poster child for privacy breaches, Facebook, announced earlier this year that it would turn to synthetic data for its upcoming AI efforts.
Finally, synthetic data also helps companies large and small scale up their AI training efforts. For example, the self-driving company Waymo has tested its technology over the course of millions of real miles, as well as billions of simulated roadways. Some are even turning to video games like Grand Theft Auto to train autonomous cars to drive during the Apocalypse.
[…]
If AI is the new electricity, then you might think of synthetic data as a potentially cheaper and faster way of generating the power necessary to charge AI algorithms. However, synthetic data techniques are still in the early stages of testing and vetting, which is reflected in the mostly young, modestly funded group of startups that we found. But with companies like Facebook in the market for synthetic data, expect that dynamic to change quickly.