A Tutorial on Connectionist and Hybrid HMM/Connectionist Systems for Speech Recognition

Trentin, Edmondo; Gori, Marco

Although Automatic Speech Recognition (ASR) systems based on hidden Markov models (HMMs) are popular and effective under many circumstances, they suffer from limitations that limit applicability of ASR technology in the real world. Between the end of the Eighties and the beginning of the Nineties, several searchers began applying Artificial Neural Networks (ANN) to ASR, with the aim to overcome such limitations. ANNs allowed for significant results on reduced-scale tasks, e.g. phoneme recognition, but they substantially failed in dealing with long time-sequences of speech signals. As a consequence, 'hybrid' systems were proposed, by combining HMMs and ANNs within a single architecture, in order to take advantage from the properties of both. This tutorial reviews some fundamental concepts of ASR, HMMs and ANNs for ASR. It then surveys major hybrid models for ASR, summarizing a variety of different architectures, novel training algorithms and experimental results from a highly specialistic and non-homogeneous literature. Five classes of hybrid systems are presented: (i) ANNs that emulate HMMs; (ii) connectionist estimate of posterior probabilities in a HMMs; (iii) joint HMM/ANN optimization over a single, overall training criterion; (iv) connectionist vector quantization for discrete HMMs; (v) ANNs for 'rescoring' the HMM hypothesis