Spectral Mapping: A Comparison of Connectionist Approaches

Trentin, Edmondo; Giuliani, Diego; Furlanello, Cesare

This paper presents two connectionist approaches to spectral mapping for speaker normalization. The first is based on a extended Radial Basis Functions network. The second approach is based on a slightly improved Multi-Layer Perceptron (MLP). The architectures of the models are briefly described, as well as their most computational features. Experimental results using 4 continuous speech, large vocabulary, speaker dependent recognition systems and 4 test speakers are reported. Only 5 utterances per speaker were used to train the normalization modules. The use of network-based normalization is shown to improve the performance of the speaker-dependent recognizers based on Hidden Markov Models. This also compares favorably with the results obtained adopting a standard linear-regression model. In particular, the generalized MLP gave a 16.9% average word error rate (WER), that represents a considerable 52% WER reduction with respect to the baseline system alone, resulting in a viable solution for the non-linear, multivariate regression problem under consideration