Automatic speech recognition models require large speech recordings for training. However, the collection of such data often is cumbersome and leads to privacy concerns. Federated learning has been widely used as an effective decentralized technique that collaboratively learns a shared model while keeping the data local on clients devices. Unfortunately, client devices often feature limited computation and communication resources leading to practical difficulties for large models. In addition, the heterogeneity that characterizes edge devices make unpractical federating a single model that fits all the different clients. Differently from the recent literature, where multiple different architectures are used, in this work we 10 propose using early-exiting. This brings 2 benefits: a single model is used on a variety of devices; federating the models is straightforward. Experiments on the public dataset TED-LIUM 3 show that our proposed approach is effective and can be combined with basic federated learning strategies. We also shed light on how to federate self-attention models for speech recognition, for which an established recipe does not exist in literature.

Fed-EE: Federating Heterogeneous ASR Models using Early-Exit Architectures

Mohamed Nabih Ali
Membro del Collaboration Group
;
Daniele Falavigna
Membro del Collaboration Group
;
Alessio Brutti
Membro del Collaboration Group
2023-01-01

Abstract

Automatic speech recognition models require large speech recordings for training. However, the collection of such data often is cumbersome and leads to privacy concerns. Federated learning has been widely used as an effective decentralized technique that collaboratively learns a shared model while keeping the data local on clients devices. Unfortunately, client devices often feature limited computation and communication resources leading to practical difficulties for large models. In addition, the heterogeneity that characterizes edge devices make unpractical federating a single model that fits all the different clients. Differently from the recent literature, where multiple different architectures are used, in this work we 10 propose using early-exiting. This brings 2 benefits: a single model is used on a variety of devices; federating the models is straightforward. Experiments on the public dataset TED-LIUM 3 show that our proposed approach is effective and can be combined with basic federated learning strategies. We also shed light on how to federate self-attention models for speech recognition, for which an established recipe does not exist in literature.
File in questo prodotto:
File Dimensione Formato  
paper_49.pdf

accesso aperto

Tipologia: Documento in Post-print
Licenza: PUBBLICO - Creative Commons 3.6
Dimensione 812.94 kB
Formato Adobe PDF
812.94 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/343747
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact