In this work we investigate the usage of TV audio data for cross-language training of multi-lingual acoustic models. We intend to take advantage from the availability of a training speech corpus formed by parallel news uttered in different languages and transmitted over separated audio channels. Spanish, French and Russian phone Hidden Markov Models (HMMs) are bootstrapped using an unsupervised training procedure starting from an Italian set of phone HMMs. The use of confidence measures in order to select the training audio data was also investigated and has proved to be effective. The usage of cross language information, i.e. exploiting the temporal alignment of news in different languages to build news-dependent Language Models (LMs), was also demonstrated to give benefits to the acoustic model training.

Cheap Bootstrap of Multi-Lingual Hidden Markov Models

Falavigna, Giuseppe Daniele;Gretter, Roberto
2011-01-01

Abstract

In this work we investigate the usage of TV audio data for cross-language training of multi-lingual acoustic models. We intend to take advantage from the availability of a training speech corpus formed by parallel news uttered in different languages and transmitted over separated audio channels. Spanish, French and Russian phone Hidden Markov Models (HMMs) are bootstrapped using an unsupervised training procedure starting from an Italian set of phone HMMs. The use of confidence measures in order to select the training audio data was also investigated and has proved to be effective. The usage of cross language information, i.e. exploiting the temporal alignment of news in different languages to build news-dependent Language Models (LMs), was also demonstrated to give benefits to the acoustic model training.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/54016
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact