In this paper, a novel on-line incremental speaker adaptation technique is proposed for real time transcription applications such as automatic closed-captioning of live TV programs. Differently from previously proposed methods, our technique does not operate at utterance level but instead speaker change detection and clustering as well as speaker adaptation occur over a short chunk of the incoming audio signal. Incremental adaptation based on feature space maximum likelihood linear regression (fMLLR) is conducted w. r. t. a Gaussian mixture model (GMM) modeling the acoustic training data. Individual speakers are represented by fMLLR transforms, and these transforms are used for speaker clustering and for performing speaker adaptation. Speech recognition experiments show that the proposed incremental adaptation technique is effective, 6% relative reduction in word-error-rate (WER) w. r. t. a non-adaptive baseline system, when it is embedded in a online transcription system applied to transcribe television news broadcasts.

An on-line incremental speaker adaptation technique for audio stream transcription

Giuliani, Diego;Brugnara, Fabio
2013-01-01

Abstract

In this paper, a novel on-line incremental speaker adaptation technique is proposed for real time transcription applications such as automatic closed-captioning of live TV programs. Differently from previously proposed methods, our technique does not operate at utterance level but instead speaker change detection and clustering as well as speaker adaptation occur over a short chunk of the incoming audio signal. Incremental adaptation based on feature space maximum likelihood linear regression (fMLLR) is conducted w. r. t. a Gaussian mixture model (GMM) modeling the acoustic training data. Individual speakers are represented by fMLLR transforms, and these transforms are used for speaker clustering and for performing speaker adaptation. Speech recognition experiments show that the proposed incremental adaptation technique is effective, 6% relative reduction in word-error-rate (WER) w. r. t. a non-adaptive baseline system, when it is embedded in a online transcription system applied to transcribe television news broadcasts.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/195012
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact