Large-scale image-text datasets like LAION-5B [1] are foundational for generative AI (Stable Diffusion, DALL-E, FLUX). Yet their uncurated, web-scraped nature raises critical concerns about embedded biases.
Instead of focusing on harmful content or outputs of trained models, we analyze the underlying demographic composition of LAION-5B for bias [5,6]:
Representational bias: unbalanced demographic group prevalence
Stereotypical bias: unwarranted associations between demographics and other attributes
Intersectional bias: compounding effects at the intersection of multiple demographics
Analysis Pipeline
Models Used
RetinaFace Face detection (≥48×48 px filter)
FairFace [2] Age, gender, race classification
DeepFace [3] Age, gender, race (cross-validation)
Emo-AffectNet [4] 7-class facial expression recognition
Bias Metric: Ducher's Z
Compares observed co-occurrence of group g and class y to expected co-occurrence if independent. Ranges from −1 (max. underrepresentation) to +1 (max. overrepresentation), with 0 indicating no association. Used for both intersectional and stereotypical bias analysis [5].
We analyze the 2024 re-release of LAION-5B, studying both main components (english and multi-language) separately.
While ~0.02% of the full dataset, our sample provides a worst-case margin of error of ±0.51% at 95% confidence for the reported proportions.
Image content was hash-verified against LAION-5B metadata to ensure data integrity.
Age: Both models show strong overrepresentation of 20–39 year-olds. LAION-2B-en skews younger than LAION-2B-multi.
Gender: Consistent male predominance, 57–61% male (FairFace), and larger in LAION-2B-multi.
Race: White is the largest group at 50–60%. The rest of the groups are consistently underrepresented across both models and both dataset components.
Emotions: Facial expression distribution is dominated by "Happiness" (33–36%) and "Neutrality" (25–26%).
Gender–Age shows the strongest consistent bias. Females are overrepresented below age 30 (peak: Z = 0.35 at 20–29), while males dominate above 30, reaching Z = 0.61 at 60–69.
Age–Race: Oldest groups (60+) underrepresented across most races. White infants are overrepresented, while all other race groups are underrepresented.
Gender–Race: Weaker and less consistent between models.
Emotion–Gender: strongest stereotypical bias. Males overrepresented in "Anger" (Z = 0.42), females in "Happiness" (Z = 0.19). This echoes the "angry-man-happy-woman" stereotype from psychology [7].
Emotion–Age: Under-30 underrepresented in "Anger" and "Disgust.", older groups (60+) in "Fear," "Sadness," and "Surprise."
Emotion–Race: Subtle and model-dependent.
Massive demographic imbalances: Young adults (20–39), White individuals (50–60%), and males (57–70%) are heavily overrepresented. Minority racial groups and older women are consistently underrepresented.
"Angry-man-happy-woman": Strong stereotypical biases link "Anger" and "Disgust" disproportionately to males, and "Happiness" to females.
Multilingual ≠ more diverse: LAION-2B-en and LAION-2B-multi exhibit remarkably similar bias profiles. The multilingual component shows only slightly greater racial/age diversity, at the cost of increased gender disparity.
Systemic and deeply embedded: These biases are shared with most web-scraped datasets, and potentially affect most generative AI models.
LAION-5B exhibits deeply embedded demographic imbalances that are consistent across dataset components and demographic models.
Future work should trace how these biases propagate through specific generative pipelines, validate findings with human annotations, and extend this audit to other large-scale datasets such as COYO-700M and DataComp.
[1] Schuhmann et al. (2022). LAION-5B: An open large-scale dataset for training next generation image-text models.
[2] Karkkainen & Joo (2021). FairFace: Face attribute dataset for balanced race, gender, and age for bias measurement and mitigation. WACV.
[3] Serengil & Ozpinar (2024). A benchmark of facial recognition pipelines and co-usability performances of modules. J. Inf. Technol.
[4] Ryumina et al. (2022). In search of a robust facial expressions recognition model: A large-scale visual cross-corpus study. Neurocomputing.
[5] Dominguez-Catena et al. (2024). Metrics for dataset demographic bias: a case study on Facial Expression Recognition. IEEE TPAMI.
[6] Buolamwini & Gebru (2018). Gender shades: Intersectional accuracy disparities in commercial gender classification. FAccT.
[7] Becker et al. (2007). The confounded nature of angry men and happy women. JPSP.
Birhane et al. (2023). On hate scaling laws for data-swamps. arXiv.
Luccioni et al. (2023). Stable bias: Evaluating societal representations in diffusion models. NeurIPS.
Nicoletti & Bass (2023). Humans are biased. Generative AI is even worse. Bloomberg.