In recent years, new language models for Italian have been spurring. However, evaluation methodologies for these models have not kept pace, remaining fragmented and often limited to the experimental sections of individual model releases. This paper introduces ItaEval, a multifaceted evaluation suite designed to address this gap. By reviewing recent literature on the evaluation of contemporary language models, we devise three overarching task categories—natural language understanding, commonsense and factual knowledge, and bias, fairness, and safety—that a contemporary model should be able to address. Next, we collect a set of 18 tasks encompassing existing and new datasets. The so-compiled ItaEval suite provides a standardized, multifaceted framework for evaluating Italian language models, facilitating more rigorous and comparative assessments of model performance. We release code and data at https://rita-nlp.org/sprints/itaeval.
ItaEval: A CALAMITA Challenge
Beatrice Savoldi
2024-01-01
Abstract
In recent years, new language models for Italian have been spurring. However, evaluation methodologies for these models have not kept pace, remaining fragmented and often limited to the experimental sections of individual model releases. This paper introduces ItaEval, a multifaceted evaluation suite designed to address this gap. By reviewing recent literature on the evaluation of contemporary language models, we devise three overarching task categories—natural language understanding, commonsense and factual knowledge, and bias, fairness, and safety—that a contemporary model should be able to address. Next, we collect a set of 18 tasks encompassing existing and new datasets. The so-compiled ItaEval suite provides a standardized, multifaceted framework for evaluating Italian language models, facilitating more rigorous and comparative assessments of model performance. We release code and data at https://rita-nlp.org/sprints/itaeval.File | Dimensione | Formato | |
---|---|---|---|
117_calamita_long.pdf
accesso aperto
Licenza:
Non specificato
Dimensione
995.53 kB
Formato
Adobe PDF
|
995.53 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.