Adaptive Training Using Simple Target Models

Stemmer, Georg; Brugnara, Fabio; Giuliani, Diego

Adaptive training aims at reducing the influence of speaker, channel and environment variability on the acoustic models. We describe an acoustic normalization approach to adaptive training. Phonetically irrelevant acoustic variability is reduced at the beginning of the training procedure w. r. t. a set of target models. The set of target models can be a set of HMMs or a Gaussian mixture model (GMM). CMLLR is applied to normalize the acoustic features. The normalized data contains less unwanted variability and is used to generate and train the recognition models. Employing a GMM as a target model leads to a text-independent procedure that can be embedded into the acoustic front-end. On a broadcast news transcription task we obtain relative reductions inWER of 7.8% in the first recognition pass over a conventionally trained system and of 3.4% in the second recognition pass over a SAT-trained system.