When real-world data isn’t enough, firms are increasingly turning to synthetic data to train their AI models.
This includes financial services firms training AI to detect fraud, and hospitals generating fake patient data to train algorithms making medical decisions.
From the Wall Street Journal:
Companies rely on real-world data to train artificial-intelligence models that can identify anomalies, make predictions and generate insights. But often, it isn’t enough.
To detect credit-card fraud, for example, researchers train AI models to look for specific patterns of known suspicious behavior, gleaned from troves of data. But unique, or rare, types of fraud are difficult to detect when there isn’t enough data to support the algorithm’s training.
To get around that, companies are learning to fake it, building so-called synthetic data sets designed to augment training data.
At American Express Co. , machine-learning and data scientists have been experimenting with synthetic data for nearly two years in hopes of improving the company’s AI-based fraud-detection models, said Dmitry Efimov, head of the company’s Machine Learning Center of Excellence. The credit-card company uses an advanced form of AI to generate fake fraud patterns aimed at bolstering the real training data.