In this work we investigate methods to extend the lexicon of a broadcast news (BN) speech recognition system in order to minimize the out-of-vocabulary (OOV) word rate. In particular, the OOV word class within the BM trigram language model is linked to a new unigram LM that is dynamically adapted to cope with language changes over time. LM extensions are evaluated according to the achieved OOV word rate, perplexity, and word-error rate. The last criterion implicitly takes into account the quality of the phonetic transcription used for the new words. In the here proposed experiments, phonetic transcriptions of new words are generated automatically by an in-house developed phonetic transcriber
Lexicon Adaptation for Broadcast News Transcription
Bertoldi, Nicola;Federico, Marcello
2001-01-01
Abstract
In this work we investigate methods to extend the lexicon of a broadcast news (BN) speech recognition system in order to minimize the out-of-vocabulary (OOV) word rate. In particular, the OOV word class within the BM trigram language model is linked to a new unigram LM that is dynamically adapted to cope with language changes over time. LM extensions are evaluated according to the achieved OOV word rate, perplexity, and word-error rate. The last criterion implicitly takes into account the quality of the phonetic transcription used for the new words. In the here proposed experiments, phonetic transcriptions of new words are generated automatically by an in-house developed phonetic transcriberI documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.