The Segmental Neural Network (SNN) architecture was introduced at BBN by Zavaliagkos et al. for rescoring the N-best hypothesis yielded by a standard Continuous Density hidden Markov model (CDHMM) applied to Automatic Speech Recognition. An enhanced connectionist model, called SNN with trainable amplitude of activation functions (SNN-TA) is first used in this paper instead of the CDHMM to perform the recognition of isolated words. Viterbi-based segmentation is then introduced, relying on the level building algorithm, that can be combined with the SNN-TA to obtain a hybrid framework for continuous speech recognition. The present paradigm is applied to the recognition of isolated digits, collected in a real car environment under several noisy conditions (traffic, speed, road conditions, etc.) using a microphone placed far from the talker. We stress the fact that robustness to noise can be increased by improving the generalization capabilities of the speech recognizer. In this perspective, while CDHMMs completely lack of a proper regularization theory, a regularized SNN-TA model is discussed, which yields effective generalization and noise-tolerance, outperforming the CDHMM on the noisy task under consideration.

The regularized SNN-TA model for recognition of noisy speech

Trentin, Edmondo;Matassoni, Marco
2000-01-01

Abstract

The Segmental Neural Network (SNN) architecture was introduced at BBN by Zavaliagkos et al. for rescoring the N-best hypothesis yielded by a standard Continuous Density hidden Markov model (CDHMM) applied to Automatic Speech Recognition. An enhanced connectionist model, called SNN with trainable amplitude of activation functions (SNN-TA) is first used in this paper instead of the CDHMM to perform the recognition of isolated words. Viterbi-based segmentation is then introduced, relying on the level building algorithm, that can be combined with the SNN-TA to obtain a hybrid framework for continuous speech recognition. The present paradigm is applied to the recognition of isolated digits, collected in a real car environment under several noisy conditions (traffic, speed, road conditions, etc.) using a microphone placed far from the talker. We stress the fact that robustness to noise can be increased by improving the generalization capabilities of the speech recognizer. In this perspective, while CDHMMs completely lack of a proper regularization theory, a regularized SNN-TA model is discussed, which yields effective generalization and noise-tolerance, outperforming the CDHMM on the noisy task under consideration.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/15
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact