Fair AI Data logotype

Accelerate your business with correct data

Synthetic data can help you harness the power of AI. But we know synthetic data risk minimizing - or even eliminating - edge cases and minority elements in your real data. And new research shows that synthetic structured data contain intersectional hallucinations - data points that are illogical. But the same synthetic dataset can simultaneously have many instances of intersectional fidelities. You need to know where in your synthetic data those fidelities and hallucinations are. The software tools we develop will help you identify them. We ensure that the synthetic data you use are unbiased, represent edge-cases and promote fairness in AI decision-making. 

Why balanced synthetic data? 

In real-world datasets, specific groups, demographics or categories are almost always under-represented. Unless the data you're using is analyzed and corrected for data imbalances, you will develop services that exclude parts of your customer base, and needs for services will go un-noticed. By generating synthetic data that is more representative, you will train AI models that are more accurate across your different customer groups. 

What is fair?

Fairness is more than just a metric. In fact, fairness is more than many different metrics. And algorithms. And concepts. What kind of fairness do you want in your data? What kinds can you get? What kinds do you need? 


Even if you know which kind of fair you want to ensure, which protected classes should you be measuring? Which ones are important in your data? Should you be thinking about gender? (“There are more than two, right? But my data only has two?”) Should you be thinking about race? If so, which ones? And how do you define race? Are you even allowed to collect data on race? (sometimes not…) Does a zip-code work as a stand in? This requires a deep understanding of bias and fairness. 

Eliminating bias is hard. We know.

To correct for data imbalances and to eliminate bias, you need to know the data, you need the technical skills to understand the metrics and algorithms, and you need to understand fairness. 


Once you have a synthetic dataset, you need to check for missing edge cases. You must also know its hallucinations and fidelities. These will dictate which domains you can use it in, and which ones you should not.


And then come the concerns of the real world. How can your synthetic data be aligned in your current practices? What compliance regulations will you need to follow? How can your synthetic dataset inspire new innovations and business?


Our software tools, derived from research in critical data studies and ML, will help.

More to read


The landscape for Fair AI is in flux. Here are some resources to help you and your team keep up-to-date: