Automatic speech recognition models require large amounts of speech recordings for training. However, the collection of suchdata is often cumbersome and leads to privacy concerns. Federated learning has been widely used as an effective decentralizedtechnique that collaboratively learns a shared prediction model while keeping the data local on different clients. Unfortunately,client devices often feature limited computational and communication resources, leading to practical difficulties for largemodels. In addition, the heterogeneity that characterizes edge devices makes it sub-optimal to generate a single model that fitsall of them. Differently from recent literature where multiple models with different architectures are used, we propose usingdynamic architectures which, employing early-exit solutions, can adapt their processing (i.e. traversed layers) depending onthe input and on the operation conditions. This solution falls in the realm of partial training methods and brings two benefits: ❶)a single model is used on a variety of devices and ❷) federating the models after local training is straightforward. Experimentson public datasets show that our proposed approach is effective and can be combined with basic federated learning strategies.

Federating dynamic models using early-exit architectures for automatic speech recognition on heterogeneous clients

Ali, Mohamed Nabih;Falavigna, Daniele;Brutti, Alessio
2025-01-01

Abstract

Automatic speech recognition models require large amounts of speech recordings for training. However, the collection of suchdata is often cumbersome and leads to privacy concerns. Federated learning has been widely used as an effective decentralizedtechnique that collaboratively learns a shared prediction model while keeping the data local on different clients. Unfortunately,client devices often feature limited computational and communication resources, leading to practical difficulties for largemodels. In addition, the heterogeneity that characterizes edge devices makes it sub-optimal to generate a single model that fitsall of them. Differently from recent literature where multiple models with different architectures are used, we propose usingdynamic architectures which, employing early-exit solutions, can adapt their processing (i.e. traversed layers) depending onthe input and on the operation conditions. This solution falls in the realm of partial training methods and brings two benefits: ❶)a single model is used on a variety of devices and ❷) federating the models after local training is straightforward. Experimentson public datasets show that our proposed approach is effective and can be combined with basic federated learning strategies.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/363547
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact