This paper introduces a novel application of the hybrid deep neural network (DNN) - hidden Markov model (HMM) approach for automatic speech recognition (ASR) to target groups of speakers of a specific age/gender. We target three speaker groups consisting of children, adult males and adult females, respectively. The group-specific training of DNN is investigated and shown to be not always effective when the amount of training data is limited. To overcome this problem, the recent approach that consists in adapting a general DNN to domain/language specific data is extended to target age/gender groups in the context of hybrid DNN-HMM systems, reducing consistently the phone error rate by 15-20% relative for the three different speaker groups.
Deep neural network adaptation for children's and adults' speech recognition
Giuliani, Diego
2014-01-01
Abstract
This paper introduces a novel application of the hybrid deep neural network (DNN) - hidden Markov model (HMM) approach for automatic speech recognition (ASR) to target groups of speakers of a specific age/gender. We target three speaker groups consisting of children, adult males and adult females, respectively. The group-specific training of DNN is investigated and shown to be not always effective when the amount of training data is limited. To overcome this problem, the recent approach that consists in adapting a general DNN to domain/language specific data is extended to target age/gender groups in the context of hybrid DNN-HMM systems, reducing consistently the phone error rate by 15-20% relative for the three different speaker groups.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.