IRIS Institutional Research Information System

Current development and benchmarking efforts for modern, large-scale Italian language models (LMs) are scattered. This paper situates such efforts by introducing two new resources: ItaEval, a comprehensive evaluation suite, and TweetyIta, an efficiency-first language model for Italian. Through ItaEval, we standardize evaluation across language understanding, commonsense and factual knowledge, and social bias-related tasks. In our attempt at language modeling, we experiment with efficient, tokenization-based adaption techniques. Our TweetyIta shows encouraging results after training on as little as 5G tokens from natural Italian corpora. We benchmark an extensive list of models against ItaEval and find several interesting insights. Surprisingly, i) models trained predominantly on English data dominate the leaderboard; ii) TweetyIta is competitive against other forms of adaptation or inherently monolingual models; iii) natural language understanding tasks are especially challenging for current models. We release code and data at https://github.com/RiTA-nlp/ita-eval and host a live leaderboard at https://huggingface.co/spaces/RiTA-nlp/ita-eval.

ItaEval and TweetyIta: A New Extensive Benchmark and Efficiency-First Language Model for Italian

Giuseppe Attanasio;Pieter Delobelle;Moreno La Quatra;Andrea Santilli;Beatrice Savoldi

2024-01-01

Abstract

Current development and benchmarking efforts for modern, large-scale Italian language models (LMs) are scattered. This paper situates such efforts by introducing two new resources: ItaEval, a comprehensive evaluation suite, and TweetyIta, an efficiency-first language model for Italian. Through ItaEval, we standardize evaluation across language understanding, commonsense and factual knowledge, and social bias-related tasks. In our attempt at language modeling, we experiment with efficient, tokenization-based adaption techniques. Our TweetyIta shows encouraging results after training on as little as 5G tokens from natural Italian corpora. We benchmark an extensive list of models against ItaEval and find several interesting insights. Surprisingly, i) models trained predominantly on English data dominate the leaderboard; ii) TweetyIta is competitive against other forms of adaptation or inherently monolingual models; iii) natural language understanding tasks are especially challenging for current models. We release code and data at https://github.com/RiTA-nlp/ita-eval and host a live leaderboard at https://huggingface.co/spaces/RiTA-nlp/ita-eval.

Scheda breve

Scheda completa

Scheda completa (DC)

Anno

2024

Appare nelle tipologie:

4.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
6_main_long.pdf accesso aperto Licenza: Non specificato Dimensione 992.34 kB Formato Adobe PDF Visualizza/Apri	992.34 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/352991

Citazioni

ND

social impact