IRIS Institutional Research Information System

Federated Learning (FL) has garnered substantial interest in training different speech-based tasks (e.g. automatic speech recognition (ASR), and other speech classification tasks): recently, fine-tuning pre-trained self-supervised models for different speech-based tasks has shown promising performance and been successfully applied in FL settings. Nevertheless, fine-tuning these architectures is computationally burdensome and not affordable in several real-time settings. Moreover, the communication costs of transferring all the model parameters for the aggregation stage is critically high. As an alternative approach, parameter-efficient fine-tuning (PEFT) approaches provide promising performance without changing the backbone of the pre-trained model. PEFT has been fruitfully applied, in a variety of flavours, for ASR in central training configurations while only few works investigate its use in FL settings. In this paper, we consolidate the use of PEFT for ASR with pre-trained models, demonstrating that it enables efficient FL reducing the amount of parameters to share with respect to full fine-tuning. We also explore combining PEFT with sparsification methods to further reduce communication cost by transmitting only a fraction of the adapter parameters. Additionally, we show that agglomerating adapters using "FedAvg" is compatible with differential privacy, aligning with trends observed in other domains. Our proposed approach is supported by experimental analysis on ASR using two public datasets, as well as on intent classification tasks.

EFL-PEFT: A communication Efficient Federated Learning framework using PEFT sparsification for ASR

Ali, Mohamed Nabih^Methodology;Falavigna, Daniele^Methodology;Brutti, Alessio^Methodology

2025-01-01

Abstract

Federated Learning (FL) has garnered substantial interest in training different speech-based tasks (e.g. automatic speech recognition (ASR), and other speech classification tasks): recently, fine-tuning pre-trained self-supervised models for different speech-based tasks has shown promising performance and been successfully applied in FL settings. Nevertheless, fine-tuning these architectures is computationally burdensome and not affordable in several real-time settings. Moreover, the communication costs of transferring all the model parameters for the aggregation stage is critically high. As an alternative approach, parameter-efficient fine-tuning (PEFT) approaches provide promising performance without changing the backbone of the pre-trained model. PEFT has been fruitfully applied, in a variety of flavours, for ASR in central training configurations while only few works investigate its use in FL settings. In this paper, we consolidate the use of PEFT for ASR with pre-trained models, demonstrating that it enables efficient FL reducing the amount of parameters to share with respect to full fine-tuning. We also explore combining PEFT with sparsification methods to further reduce communication cost by transmitting only a fraction of the adapter parameters. Additionally, we show that agglomerating adapters using "FedAvg" is compatible with differential privacy, aligning with trends observed in other domains. Our proposed approach is supported by experimental analysis on ASR using two public datasets, as well as on intent classification tasks.

Scheda breve

Scheda completa

Scheda completa (DC)

Anno

2025

Appare nelle tipologie:

4.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
EFL-PEFT_A_communication_Efficient_Federated_Learning_framework_using_PEFT_sparsification_for_ASR.pdf accesso aperto Licenza: PUBBLICO - Pubblico con Copyright Dimensione 468.9 kB Formato Adobe PDF Visualizza/Apri	468.9 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/357830

Citazioni

ND

social impact