Federated Learning (FL) has garnered substantial interest in training different speech-based tasks (e.g. automatic speech recognition (ASR), and other speech classification tasks): recently, fine-tuning pre-trained self-supervised models for different speech-based tasks has shown promising performance and been successfully applied in FL settings. Nevertheless, fine-tuning these architectures is computationally burdensome and not affordable in several real-time settings. Moreover, the communication costs of transferring all the model parameters for the aggregation stage is critically high. As an alternative approach, parameter-efficient fine-tuning (PEFT) approaches provide promising performance without changing the backbone of the pre-trained model. PEFT has been fruitfully applied, in a variety of flavours, for ASR in central training configurations while only few works investigate its use in FL settings. In this paper, we consolidate the use of PEFT for ASR with pre-trained models, demonstrating that it enables efficient FL reducing the amount of parameters to share with respect to full fine-tuning. We also explore combining PEFT with sparsification methods to further reduce communication cost by transmitting only a fraction of the adapter parameters. Additionally, we show that agglomerating adapters using "FedAvg" is compatible with differential privacy, aligning with trends observed in other domains. Our proposed approach is supported by experimental analysis on ASR using two public datasets, as well as on intent classification tasks.
EFL-PEFT: A communication Efficient Federated Learning framework using PEFT sparsification for ASR
Ali, Mohamed Nabih
Methodology
;Falavigna, DanieleMethodology
;Brutti, AlessioMethodology
2025-01-01
Abstract
Federated Learning (FL) has garnered substantial interest in training different speech-based tasks (e.g. automatic speech recognition (ASR), and other speech classification tasks): recently, fine-tuning pre-trained self-supervised models for different speech-based tasks has shown promising performance and been successfully applied in FL settings. Nevertheless, fine-tuning these architectures is computationally burdensome and not affordable in several real-time settings. Moreover, the communication costs of transferring all the model parameters for the aggregation stage is critically high. As an alternative approach, parameter-efficient fine-tuning (PEFT) approaches provide promising performance without changing the backbone of the pre-trained model. PEFT has been fruitfully applied, in a variety of flavours, for ASR in central training configurations while only few works investigate its use in FL settings. In this paper, we consolidate the use of PEFT for ASR with pre-trained models, demonstrating that it enables efficient FL reducing the amount of parameters to share with respect to full fine-tuning. We also explore combining PEFT with sparsification methods to further reduce communication cost by transmitting only a fraction of the adapter parameters. Additionally, we show that agglomerating adapters using "FedAvg" is compatible with differential privacy, aligning with trends observed in other domains. Our proposed approach is supported by experimental analysis on ASR using two public datasets, as well as on intent classification tasks.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.