In this work we investigate methods to extend the lexicon of a broadcast news (BN) speech recognition system in order to minimize the out-of-vocabulary (OOV) word rate. In particular, the OOV word class within the BM trigram language model is linked to a new unigram LM that is dynamically adapted to cope with language changes over time. LM extensions are evaluated according to the achieved OOV word rate, perplexity, and word-error rate. The last criterion implicitly takes into account the quality of the phonetic transcription used for the new words. In the here proposed experiments, phonetic transcriptions of new words are generated automatically by an in-house developed phonetic transcriber

Lexicon Adaptation for Broadcast News Transcription

Bertoldi, Nicola;Federico, Marcello
2001-01-01

Abstract

In this work we investigate methods to extend the lexicon of a broadcast news (BN) speech recognition system in order to minimize the out-of-vocabulary (OOV) word rate. In particular, the OOV word class within the BM trigram language model is linked to a new unigram LM that is dynamically adapted to cope with language changes over time. LM extensions are evaluated according to the achieved OOV word rate, perplexity, and word-error rate. The last criterion implicitly takes into account the quality of the phonetic transcription used for the new words. In the here proposed experiments, phonetic transcriptions of new words are generated automatically by an in-house developed phonetic transcriber
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/334
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact