Federated Learning (FL) has garnered substantial interest in training different speech-based tasks (e.g. automatic speech recognition (ASR), and other speech classification tasks): recently, fine-tuning pre-trained self-supervised models for different speech-based tasks has shown promising performance and been successfully applied in FL settings. Nevertheless, fine-tuning these architectures is computationally burdensome and not affordable in several real-time settings. Moreover, the communication costs of transferring all the model parameters for the aggregation stage is critically high. As an alternative approach, parameter-efficient fine-tuning (PEFT) approaches provide promising performance without changing the backbone of the pre-trained model. PEFT has been fruitfully applied, in a variety of flavours, for ASR in central training configurations while only few works investigate its use in FL settings. In this paper, we consolidate the use of PEFT for ASR with pre-trained models, demonstrating that it enables efficient FL reducing the amount of parameters to share with respect to full fine-tuning. We also explore combining PEFT with sparsification methods to further reduce communication cost by transmitting only a fraction of the adapter parameters. Additionally, we show that agglomerating adapters using "FedAvg" is compatible with differential privacy, aligning with trends observed in other domains. Our proposed approach is supported by experimental analysis on ASR using two public datasets, as well as on intent classification tasks.

EFL-PEFT: A communication Efficient Federated Learning framework using PEFT sparsification for ASR

Ali, Mohamed Nabih
Methodology
;
Falavigna, Daniele
Methodology
;
Brutti, Alessio
Methodology
2025-01-01

Abstract

Federated Learning (FL) has garnered substantial interest in training different speech-based tasks (e.g. automatic speech recognition (ASR), and other speech classification tasks): recently, fine-tuning pre-trained self-supervised models for different speech-based tasks has shown promising performance and been successfully applied in FL settings. Nevertheless, fine-tuning these architectures is computationally burdensome and not affordable in several real-time settings. Moreover, the communication costs of transferring all the model parameters for the aggregation stage is critically high. As an alternative approach, parameter-efficient fine-tuning (PEFT) approaches provide promising performance without changing the backbone of the pre-trained model. PEFT has been fruitfully applied, in a variety of flavours, for ASR in central training configurations while only few works investigate its use in FL settings. In this paper, we consolidate the use of PEFT for ASR with pre-trained models, demonstrating that it enables efficient FL reducing the amount of parameters to share with respect to full fine-tuning. We also explore combining PEFT with sparsification methods to further reduce communication cost by transmitting only a fraction of the adapter parameters. Additionally, we show that agglomerating adapters using "FedAvg" is compatible with differential privacy, aligning with trends observed in other domains. Our proposed approach is supported by experimental analysis on ASR using two public datasets, as well as on intent classification tasks.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/357830
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact