This paper presents two connectionist approaches to spectral mapping for speaker normalization. The first is based on a extended Radial Basis Functions network. The second approach is based on a slightly improved Multi-Layer Perceptron (MLP). The architectures of the models are briefly described, as well as their most computational features. Experimental results using 4 continuous speech, large vocabulary, speaker dependent recognition systems and 4 test speakers are reported. Only 5 utterances per speaker were used to train the normalization modules. The use of network-based normalization is shown to improve the performance of the speaker-dependent recognizers based on Hidden Markov Models. This also compares favorably with the results obtained adopting a standard linear-regression model. In particular, the generalized MLP gave a 16.9% average word error rate (WER), that represents a considerable 52% WER reduction with respect to the baseline system alone, resulting in a viable solution for the non-linear, multivariate regression problem under consideration

Spectral Mapping: A Comparison of Connectionist Approaches

Trentin, Edmondo;Giuliani, Diego;Furlanello, Cesare
1996-01-01

Abstract

This paper presents two connectionist approaches to spectral mapping for speaker normalization. The first is based on a extended Radial Basis Functions network. The second approach is based on a slightly improved Multi-Layer Perceptron (MLP). The architectures of the models are briefly described, as well as their most computational features. Experimental results using 4 continuous speech, large vocabulary, speaker dependent recognition systems and 4 test speakers are reported. Only 5 utterances per speaker were used to train the normalization modules. The use of network-based normalization is shown to improve the performance of the speaker-dependent recognizers based on Hidden Markov Models. This also compares favorably with the results obtained adopting a standard linear-regression model. In particular, the generalized MLP gave a 16.9% average word error rate (WER), that represents a considerable 52% WER reduction with respect to the baseline system alone, resulting in a viable solution for the non-linear, multivariate regression problem under consideration
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/1192
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact