Current development and benchmarking efforts for modern, large-scale Italian language models (LMs) are scattered. This paper situates such efforts by introducing two new resources: ItaEval, a comprehensive evaluation suite, and TweetyIta, an efficiency-first language model for Italian. Through ItaEval, we standardize evaluation across language understanding, commonsense and factual knowledge, and social bias-related tasks. In our attempt at language modeling, we experiment with efficient, tokenization-based adaption techniques. Our TweetyIta shows encouraging results after training on as little as 5G tokens from natural Italian corpora. We benchmark an extensive list of models against ItaEval and find several interesting insights. Surprisingly, i) models trained predominantly on English data dominate the leaderboard; ii) TweetyIta is competitive against other forms of adaptation or inherently monolingual models; iii) natural language understanding tasks are especially challenging for current models. We release code and data at https://github.com/RiTA-nlp/ita-eval and host a live leaderboard at https://huggingface.co/spaces/RiTA-nlp/ita-eval.
ItaEval and TweetyIta: A New Extensive Benchmark and Efficiency-First Language Model for Italian
Beatrice Savoldi
2024-01-01
Abstract
Current development and benchmarking efforts for modern, large-scale Italian language models (LMs) are scattered. This paper situates such efforts by introducing two new resources: ItaEval, a comprehensive evaluation suite, and TweetyIta, an efficiency-first language model for Italian. Through ItaEval, we standardize evaluation across language understanding, commonsense and factual knowledge, and social bias-related tasks. In our attempt at language modeling, we experiment with efficient, tokenization-based adaption techniques. Our TweetyIta shows encouraging results after training on as little as 5G tokens from natural Italian corpora. We benchmark an extensive list of models against ItaEval and find several interesting insights. Surprisingly, i) models trained predominantly on English data dominate the leaderboard; ii) TweetyIta is competitive against other forms of adaptation or inherently monolingual models; iii) natural language understanding tasks are especially challenging for current models. We release code and data at https://github.com/RiTA-nlp/ita-eval and host a live leaderboard at https://huggingface.co/spaces/RiTA-nlp/ita-eval.File | Dimensione | Formato | |
---|---|---|---|
6_main_long.pdf
accesso aperto
Licenza:
Non specificato
Dimensione
992.34 kB
Formato
Adobe PDF
|
992.34 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.