The development of large models produced substan-tial progress in Automatic Speech Recognition (ASR) domain, but, these models lack the architectural adaptability to perform optimally in low-resource settings. Static model reduction (SMR) techniques are effective for lowering the computational budget, but do not provide architectures that can adapt to the per-formance traits of each device. Additionally, SMR techniques require re-training for efficient performance in modified compu-tational budget. Alternatively, the models trained with dynamic approaches, such as Layer Drop, have inherent ability to scale their architecture at inference time. However, their behavior in different computational resource settings is not investigated to the best of our knowledge. In this work, we perform an exhaustive analysis on training and inference phase Layer Dropping using different values of dropping probability, and experimentally quantified the performance-computation trade-off. In addition, we evaluated 3 different dropping strategies on LibriSpeech and TED-LIUM corpora, and for the same dropping amount, we achieved 4.93% reduction in word error rate (WER) metric as compared to state-of-the-art. Furthermore, we provide the detailed comparison with early-exits and small conformer solutions. Finally, we analyzed the inclusion of scaling factor whose value is proportional to the dropping amount and retaining of Layer Normalization in conformer module during training.

LDASR: An Experimental Study on Layer Drop Using Conformer-Based Architecture

Hannan, Abdul;Brutti, Alessio;Falavigna, Daniele
2024-01-01

Abstract

The development of large models produced substan-tial progress in Automatic Speech Recognition (ASR) domain, but, these models lack the architectural adaptability to perform optimally in low-resource settings. Static model reduction (SMR) techniques are effective for lowering the computational budget, but do not provide architectures that can adapt to the per-formance traits of each device. Additionally, SMR techniques require re-training for efficient performance in modified compu-tational budget. Alternatively, the models trained with dynamic approaches, such as Layer Drop, have inherent ability to scale their architecture at inference time. However, their behavior in different computational resource settings is not investigated to the best of our knowledge. In this work, we perform an exhaustive analysis on training and inference phase Layer Dropping using different values of dropping probability, and experimentally quantified the performance-computation trade-off. In addition, we evaluated 3 different dropping strategies on LibriSpeech and TED-LIUM corpora, and for the same dropping amount, we achieved 4.93% reduction in word error rate (WER) metric as compared to state-of-the-art. Furthermore, we provide the detailed comparison with early-exits and small conformer solutions. Finally, we analyzed the inclusion of scaling factor whose value is proportional to the dropping amount and retaining of Layer Normalization in conformer module during training.
File in questo prodotto:
File Dimensione Formato  
_Modified_as_per_Reviews__LDASR__An_Experimental_Study_on_Layer_Drop_using_Conformer_based_Architecture.pdf

solo utenti autorizzati

Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 1.46 MB
Formato Adobe PDF
1.46 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/354627
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact