Although attention has been devoted to the issue of online hate speech, some phenomena, such as ableism or ageism, are scarcely represented by existing datasets and case studies. This can lead to hate speech detection systems that do not perform well on underrepresented identity groups. Given the unprecedented capabilities of LLMs in producing high-quality data, we investigate the possibility of augmenting existing data with generative language models, reducing target imbalance. We experiment with augmenting 1,000 posts from the Measuring Hate Speech corpus, an English dataset annotated with target identity information, adding around 30,000 synthetic examples using both simple data augmentation methods and different types of generative models, comparing autoregressive and sequence-to-sequence approaches. We focus our evaluation on the performance of models on different identity groups, finding that performance can differ greatly for different targets and "simpler" data augmentation approaches can improve classification better than state-of-the-art language models.

On the Impact of Hate Speech Synthetic Data on Model Fairness

Camilla Casula
;
Sara Tonelli
2025-01-01

Abstract

Although attention has been devoted to the issue of online hate speech, some phenomena, such as ableism or ageism, are scarcely represented by existing datasets and case studies. This can lead to hate speech detection systems that do not perform well on underrepresented identity groups. Given the unprecedented capabilities of LLMs in producing high-quality data, we investigate the possibility of augmenting existing data with generative language models, reducing target imbalance. We experiment with augmenting 1,000 posts from the Measuring Hate Speech corpus, an English dataset annotated with target identity information, adding around 30,000 synthetic examples using both simple data augmentation methods and different types of generative models, comparing autoregressive and sequence-to-sequence approaches. We focus our evaluation on the performance of models on different identity groups, finding that performance can differ greatly for different targets and "simpler" data augmentation approaches can improve classification better than state-of-the-art language models.
File in questo prodotto:
File Dimensione Formato  
20_main_long.pdf

accesso aperto

Licenza: Creative commons
Dimensione 1.34 MB
Formato Adobe PDF
1.34 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/365129
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact