Connectionist Speaker Normalization with Generalized Resource Allocating Networks

Furlanello, Cesare; Giuliani, Diego; Trentin, Edmondo

Inter-speaker variability is one of the principal error sources in automatic speech recognition. This paper presents a rapid speaker-normalization technique based on neural network spectral mapping. The neural network is used as a front-end of a continuous speech recognition system (speaker-dependent, Hmm-based) to normalize the input acoustic data from a new speaker. The spectral difference between speakers can be reduced using a limited amount of new acoustic data (40 phonetically rich sentences). Recognition error of phone units from the acoustic-phonetic continuous speech corpus APASCI is decreased with an adaptability ratio of 25%. We used local basis networks of elliptical Gaussian kernels, with recursive allocation of units and on-line optimization of parameters (GRAN model). For this application, the model included a linear term. He results compare favorably with multivariate linear mapping based on constrained orthonormal transformations