Intersectional hallucination (IH) in generative models refers to the production of synthetic datasets containing unrealistic or logically inconsistent combinations of features found in the original data. This phenomenon can distort datasets, for example in datasets containing demographic information in domains such as healthcare, finance, criminal justice and marketing. In healthcare, IH could lead to synthetic data points showing medical conditions and treatments that are unlikely or medically inaccurate.
On the other hand, intersectional hallucination can be beneficial in applications where privacy is crucial, or it can facilitate data augmentation to introduce a novel combination of features that expand the diversity and variability of the dataset and potentially improve the robustness and generalization of the ML models.
Careful examination of the synthetic dataset is necessary to ensure these hallucinations are not problematic.
We are exploring intersectional hallucinations and fidelities in structured synthetic data. Here is a recent piece Johnson & Hajisharif wrote about them in AI & Society, based on the hallucinations produced in synthesized 1990 US Adult Census Data (which you can read more about here).
Please join the conversation! We’d like to know what you think about intersectional hallucinations. We ask for your name and email address, but feel free to submit your thoughts anonymously if you prefer. These are questions we are particularly interested in, but please share whatever is on your mind.
Comments in this discussion forum may be used to direct further research on intersectional hallucination and hone our tools to find and address them.
© Fair AI Data