IRIS Institutional Research Information System

The development of speech foundation models (SFMs) like Whisper and SeamlessM4T has significantly advanced the field of speech processing. However, their closed nature–with inaccessible training data and code–poses major reproducibility and fair evaluation challenges. While other domains have made substantial progress toward open science by developing fully transparent models trained on open-source (OS) code and data, similar efforts in speech processing remain limited. To fill this gap, weintroduceFAMA,thefirstfamilyofopenscienceSFMsforEnglishandItalian, trainedon150k+hoursofOSspeechdata. Moreover, we present a new dataset containing 16k hours of cleaned and pseudo-labeled speech for both languages. Results show that FAMA achieves competitive performance compared to existing SFMs while being up to 8 times faster. All artifacts, including codebase, datasets, and models, are released under OS-compliant licenses, promoting openness in speech technology research. The FAMA collection is available at: https://huggingface.co/collections/FBK-MT/fama-683425df3fb2b3171e0cdc9e

FAMA: The First Large-Scale Open-Science Speech Foundation Model for English and Italian

Sara Papi;Marco Gaido;Luisa Bentivogli;Alessio Brutti;Mauro Cettolo;Roberto Gretter;Marco Matassoni;Mohamed Nabih;Matteo Negri

2025-01-01

Abstract

The development of speech foundation models (SFMs) like Whisper and SeamlessM4T has significantly advanced the field of speech processing. However, their closed nature–with inaccessible training data and code–poses major reproducibility and fair evaluation challenges. While other domains have made substantial progress toward open science by developing fully transparent models trained on open-source (OS) code and data, similar efforts in speech processing remain limited. To fill this gap, weintroduceFAMA,thefirstfamilyofopenscienceSFMsforEnglishandItalian, trainedon150k+hoursofOSspeechdata. Moreover, we present a new dataset containing 16k hours of cleaned and pseudo-labeled speech for both languages. Results show that FAMA achieves competitive performance compared to existing SFMs while being up to 8 times faster. All artifacts, including codebase, datasets, and models, are released under OS-compliant licenses, promoting openness in speech technology research. The FAMA collection is available at: https://huggingface.co/collections/FBK-MT/fama-683425df3fb2b3171e0cdc9e

Scheda breve

Scheda completa

Scheda completa (DC)

Anno

2025

Appare nelle tipologie:

4.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
80_main_long.pdf accesso aperto Descrizione: paper open access Licenza: Creative commons Dimensione 231.09 kB Formato Adobe PDF Visualizza/Apri	231.09 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/363549

Citazioni

ND

social impact