In recent years, new language models for Italian have been spurring. However, evaluation methodologies for these models have not kept pace, remaining fragmented and often limited to the experimental sections of individual model releases. This paper introduces ItaEval, a multifaceted evaluation suite designed to address this gap. By reviewing recent literature on the evaluation of contemporary language models, we devise three overarching task categories—natural language understanding, commonsense and factual knowledge, and bias, fairness, and safety—that a contemporary model should be able to address. Next, we collect a set of 18 tasks encompassing existing and new datasets. The so-compiled ItaEval suite provides a standardized, multifaceted framework for evaluating Italian language models, facilitating more rigorous and comparative assessments of model performance. We release code and data at https://rita-nlp.org/sprints/itaeval.

ItaEval: A CALAMITA Challenge

Beatrice Savoldi
2024-01-01

Abstract

In recent years, new language models for Italian have been spurring. However, evaluation methodologies for these models have not kept pace, remaining fragmented and often limited to the experimental sections of individual model releases. This paper introduces ItaEval, a multifaceted evaluation suite designed to address this gap. By reviewing recent literature on the evaluation of contemporary language models, we devise three overarching task categories—natural language understanding, commonsense and factual knowledge, and bias, fairness, and safety—that a contemporary model should be able to address. Next, we collect a set of 18 tasks encompassing existing and new datasets. The so-compiled ItaEval suite provides a standardized, multifaceted framework for evaluating Italian language models, facilitating more rigorous and comparative assessments of model performance. We release code and data at https://rita-nlp.org/sprints/itaeval.
File in questo prodotto:
File Dimensione Formato  
117_calamita_long.pdf

accesso aperto

Licenza: Non specificato
Dimensione 995.53 kB
Formato Adobe PDF
995.53 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/353007
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact