Model Selection of Combined Neural Nets for Speech Recognition, Chapter 9

Furlanello, Cesare; Giuliani, Diego; Merler, Stefano; Trentin, Edmondo

The problem of finding criteria through which a model will be chosen to match problems and available data and give optimal future performance is a crucial issue in practical applications, not to be understimated when proposing model combination to solve a complex regression or classification task. How can it be ensured that each specialized model has been trained with enough material and that the aggregate model has the optimal structure for reducing error on novel inputs? What if a key requirement is minimization of training material and time? This chapter introduces bootstrap error estimation for automatic model selection in combined networks: the resulting model is embedded in the acoustic front-end of an automatic speech recognition system based on hidden Markov models. The method is evaluated in two applications: in a large vocabulary (10,000 words), continuous speech recognition task and in digit recognition over noisy telephone line. Bootstrap estimates of minimum MSE allow selection of regression models that improve system recognition performance. The procedure allows a flexible strategy for dealing with inter-speaker variability without requiring an additional validation set. Recognition results are compared for linear, generalized Radial Basis Functions and Multilayer Perceptron network architectures and with system re-training methods