IRIS Institutional Research Information System

The development of large models produced substan-tial progress in Automatic Speech Recognition (ASR) domain, but, these models lack the architectural adaptability to perform optimally in low-resource settings. Static model reduction (SMR) techniques are effective for lowering the computational budget, but do not provide architectures that can adapt to the per-formance traits of each device. Additionally, SMR techniques require re-training for efficient performance in modified compu-tational budget. Alternatively, the models trained with dynamic approaches, such as Layer Drop, have inherent ability to scale their architecture at inference time. However, their behavior in different computational resource settings is not investigated to the best of our knowledge. In this work, we perform an exhaustive analysis on training and inference phase Layer Dropping using different values of dropping probability, and experimentally quantified the performance-computation trade-off. In addition, we evaluated 3 different dropping strategies on LibriSpeech and TED-LIUM corpora, and for the same dropping amount, we achieved 4.93% reduction in word error rate (WER) metric as compared to state-of-the-art. Furthermore, we provide the detailed comparison with early-exits and small conformer solutions. Finally, we analyzed the inclusion of scaling factor whose value is proportional to the dropping amount and retaining of Layer Normalization in conformer module during training.

LDASR: An Experimental Study on Layer Drop Using Conformer-Based Architecture

Hannan, Abdul;Brutti, Alessio;Falavigna, Daniele

2024-01-01

Abstract

The development of large models produced substan-tial progress in Automatic Speech Recognition (ASR) domain, but, these models lack the architectural adaptability to perform optimally in low-resource settings. Static model reduction (SMR) techniques are effective for lowering the computational budget, but do not provide architectures that can adapt to the per-formance traits of each device. Additionally, SMR techniques require re-training for efficient performance in modified compu-tational budget. Alternatively, the models trained with dynamic approaches, such as Layer Drop, have inherent ability to scale their architecture at inference time. However, their behavior in different computational resource settings is not investigated to the best of our knowledge. In this work, we perform an exhaustive analysis on training and inference phase Layer Dropping using different values of dropping probability, and experimentally quantified the performance-computation trade-off. In addition, we evaluated 3 different dropping strategies on LibriSpeech and TED-LIUM corpora, and for the same dropping amount, we achieved 4.93% reduction in word error rate (WER) metric as compared to state-of-the-art. Furthermore, we provide the detailed comparison with early-exits and small conformer solutions. Finally, we analyzed the inclusion of scaling factor whose value is proportional to the dropping amount and retaining of Layer Normalization in conformer module during training.

Scheda breve

Scheda completa

Scheda completa (DC)

Anno

2024

Appare nelle tipologie:

4.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
_Modified_as_per_Reviews__LDASR__An_Experimental_Study_on_Layer_Drop_using_Conformer_based_Architecture.pdf solo utenti autorizzati Licenza: NON PUBBLICO - Accesso privato/ristretto Dimensione 1.46 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.46 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/354627

Citazioni

ND

social impact