Current development and benchmarking efforts for modern, large-scale Italian language models (LMs) are scattered. This paper situates such efforts by introducing two new resources: ItaEval, a comprehensive evaluation suite, and TweetyIta, an efficiency-first language model for Italian. Through ItaEval, we standardize evaluation across language understanding, commonsense and factual knowledge, and social bias-related tasks. In our attempt at language modeling, we experiment with efficient, tokenization-based adaption techniques. Our TweetyIta shows encouraging results after training on as little as 5G tokens from natural Italian corpora. We benchmark an extensive list of models against ItaEval and find several interesting insights. Surprisingly, i) models trained predominantly on English data dominate the leaderboard; ii) TweetyIta is competitive against other forms of adaptation or inherently monolingual models; iii) natural language understanding tasks are especially challenging for current models. We release code and data at https://github.com/RiTA-nlp/ita-eval and host a live leaderboard at https://huggingface.co/spaces/RiTA-nlp/ita-eval.

ItaEval and TweetyIta: A New Extensive Benchmark and Efficiency-First Language Model for Italian

Beatrice Savoldi
2024-01-01

Abstract

Current development and benchmarking efforts for modern, large-scale Italian language models (LMs) are scattered. This paper situates such efforts by introducing two new resources: ItaEval, a comprehensive evaluation suite, and TweetyIta, an efficiency-first language model for Italian. Through ItaEval, we standardize evaluation across language understanding, commonsense and factual knowledge, and social bias-related tasks. In our attempt at language modeling, we experiment with efficient, tokenization-based adaption techniques. Our TweetyIta shows encouraging results after training on as little as 5G tokens from natural Italian corpora. We benchmark an extensive list of models against ItaEval and find several interesting insights. Surprisingly, i) models trained predominantly on English data dominate the leaderboard; ii) TweetyIta is competitive against other forms of adaptation or inherently monolingual models; iii) natural language understanding tasks are especially challenging for current models. We release code and data at https://github.com/RiTA-nlp/ita-eval and host a live leaderboard at https://huggingface.co/spaces/RiTA-nlp/ita-eval.
File in questo prodotto:
File Dimensione Formato  
6_main_long.pdf

accesso aperto

Licenza: Non specificato
Dimensione 992.34 kB
Formato Adobe PDF
992.34 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/352991
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact