This work presents a language model adaptation method combining the latent semantic analysis framework with the minimum discrimination information estimation criterion. In particular, an unsupervised topic model decomposition is built which allows to infer topic related word distributions from very short adaptation texts. The resulting word distribution is then used to contraint the estimation of a minimum divergence trigram language. With respect to previous work, implementation detais are discussed that make such approach effective for a large scale application. Experimental results are provided for a digital library indexing task, i.e. the speech transcription of five historical documentary films. By adapting a trigram language model from very terse content descriptions, i.e. maximum ten words, available for each film a word error rate relative reduction of 3.2% was achieved

Language Model Adaptation through TOpic Decomposition and MDI Estimation

Federico, Marcello
2002-01-01

Abstract

This work presents a language model adaptation method combining the latent semantic analysis framework with the minimum discrimination information estimation criterion. In particular, an unsupervised topic model decomposition is built which allows to infer topic related word distributions from very short adaptation texts. The resulting word distribution is then used to contraint the estimation of a minimum divergence trigram language. With respect to previous work, implementation detais are discussed that make such approach effective for a large scale application. Experimental results are provided for a digital library indexing task, i.e. the speech transcription of five historical documentary films. By adapting a trigram language model from very terse content descriptions, i.e. maximum ten words, available for each film a word error rate relative reduction of 3.2% was achieved
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/196
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact