Robust Segmental-Connectionist Learning for Recognition of Noisy Speech

Trentin, Edmondo; Matassoni, Marco

Neural networks learning theory draws a relationship between `learning with noise` and applying a regularization term in the cost function that is minimized during the training process on clean (nn-noisy) data. Application of regularizers and other robust training techniques are aimed at improving the generalization ability of connectionist models, reducing overfitting. This paper presents an application of a variant of the so called Segmental Neural Network (SNN) to the recognition of speaker independent isolated words with noise. The SNN is enhanced with the introduction of trainable amplitudes of activation functions (SNN-TA), that act as regularizers and increase robustness toward noise. Experimental results show that when training is accomplished on clean data, the SNN-TA outperforms a standard Continuous Density HMM in the recognition task of noisy Italian digits, and the performance turns out to be stable as the Signal-to-Noise ratio gets lower. Current research is focused on extending the present scheme for continuous speech recognition.