In this work we investigate the usage of TV audio data for cross-language training of multi-lingual acoustic models. We intend to take advantage from the availability of a training speech corpus formed by parallel news uttered in different languages and transmitted over separated audio channels. Spanish, French and Russian phone Hidden Markov Models (HMMs) are bootstrapped using an unsupervised training procedure starting from an Italian set of phone HMMs. The use of confidence measures in order to select the training audio data was also investigated and has proved to be effective. The usage of cross language information, i.e. exploiting the temporal alignment of news in different languages to build news-dependent Language Models (LMs), was also demonstrated to give benefits to the acoustic model training.
Cheap Bootstrap of Multi-Lingual Hidden Markov Models
Falavigna, Giuseppe Daniele;Gretter, Roberto
2011-01-01
Abstract
In this work we investigate the usage of TV audio data for cross-language training of multi-lingual acoustic models. We intend to take advantage from the availability of a training speech corpus formed by parallel news uttered in different languages and transmitted over separated audio channels. Spanish, French and Russian phone Hidden Markov Models (HMMs) are bootstrapped using an unsupervised training procedure starting from an Italian set of phone HMMs. The use of confidence measures in order to select the training audio data was also investigated and has proved to be effective. The usage of cross language information, i.e. exploiting the temporal alignment of news in different languages to build news-dependent Language Models (LMs), was also demonstrated to give benefits to the acoustic model training.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.