Federating dynamic models using early-exit architectures for automatic speech recognition on heterogeneous clients

Ali, Mohamed Nabih; Falavigna, Daniele; Brutti, Alessio

doi:10.1007/s13748-025-00412-w

Automatic speech recognition models require large amounts of speech recordings for training. However, the collection of suchdata is often cumbersome and leads to privacy concerns. Federated learning has been widely used as an effective decentralizedtechnique that collaboratively learns a shared prediction model while keeping the data local on different clients. Unfortunately,client devices often feature limited computational and communication resources, leading to practical difﬁculties for largemodels. In addition, the heterogeneity that characterizes edge devices makes it sub-optimal to generate a single model that ﬁtsall of them. Differently from recent literature where multiple models with different architectures are used, we propose usingdynamic architectures which, employing early-exit solutions, can adapt their processing (i.e. traversed layers) depending onthe input and on the operation conditions. This solution falls in the realm of partial training methods and brings two beneﬁts: ❶)a single model is used on a variety of devices and ❷) federating the models after local training is straightforward. Experimentson public datasets show that our proposed approach is effective and can be combined with basic federated learning strategies.