Misinformation is a global issue that shapes public discourse, influencing opinions and decision-making across various domains. While automated fact-checking (AFC) has become essential in combating misinformation, most work in multilingual settings has focused on claim verification rather than generating explanatory verdicts (i.e. short texts discussing the veracity of the claim), leaving a gap in AFC resources beyond English.To this end, we introduce EuroVerdict, a multilingual dataset designed for verdict generation, covering eight European languages. Developed in collaboration with professional fact-checkers, the dataset comprises claims, manually written verdicts, and supporting evidence, including fact-checking articles and additional secondary sources. We evaluate EuroVerdict with Llama-3.1-8B-Instruct on verdict generation under different settings, varying the prompt language, input article language, and training approach. Our results show that fine-tuning consistently improves performance, with models fine-tuned on original-language articles achieving the highest scores in both automatic and human evaluations. Using articles in a different language from the claim slightly lowers performance; however, pairing them with language-specific prompts improves results. Zero-shot and Chain-of-Thought setups perform worse, reinforcing the benefits of fine-tuning for multilingual verdict generation.

EuroVerdict: A Multilingual Dataset for Verdict Generation Against Misinformation

Russo, Daniel
;
Sadeghi, Fariba;Menini, Stefano;Guerini, Marco
2025-01-01

Abstract

Misinformation is a global issue that shapes public discourse, influencing opinions and decision-making across various domains. While automated fact-checking (AFC) has become essential in combating misinformation, most work in multilingual settings has focused on claim verification rather than generating explanatory verdicts (i.e. short texts discussing the veracity of the claim), leaving a gap in AFC resources beyond English.To this end, we introduce EuroVerdict, a multilingual dataset designed for verdict generation, covering eight European languages. Developed in collaboration with professional fact-checkers, the dataset comprises claims, manually written verdicts, and supporting evidence, including fact-checking articles and additional secondary sources. We evaluate EuroVerdict with Llama-3.1-8B-Instruct on verdict generation under different settings, varying the prompt language, input article language, and training approach. Our results show that fine-tuning consistently improves performance, with models fine-tuned on original-language articles achieving the highest scores in both automatic and human evaluations. Using articles in a different language from the claim slightly lowers performance; however, pairing them with language-specific prompts improves results. Zero-shot and Chain-of-Thought setups perform worse, reinforcing the benefits of fine-tuning for multilingual verdict generation.
File in questo prodotto:
File Dimensione Formato  
2025.findings-acl.853.pdf

accesso aperto

Licenza: Copyright dell'editore
Dimensione 6.43 MB
Formato Adobe PDF
6.43 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/369668
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact