This paper describes experiments in using speech data, collected by means of commercial services, in order to perform unsupervised or nearly unsupervised acoustic model retraining. In the first case the speech material will be used in fully unsupervised way, while in the second one a small quantity of speech will be automatically selected and then manually transcribed. The effectiveness of the aproach is measured in terms of reduction of word (sentence) error rate, on a test set disjoint from the retraining data. Tasks considered here concern connected digits and numberplates (basically alphadigits and numbers). The idea consists in retraining the acoustic models by adding to the "baseline" training set only a subset of the newly acquired speech material, obtained discarding the "worst" part of the speech data. This method allows, with few or none manual transcriptions, to obtain significant improvements in recognition accuracy, avoiding to manually transcribe large amounts of speech data

Task-Oriented Unsupervised / Nearly Unsupervised Acoustic Model Retraining

Facco, Andrea;Gretter, Roberto
2002-01-01

Abstract

This paper describes experiments in using speech data, collected by means of commercial services, in order to perform unsupervised or nearly unsupervised acoustic model retraining. In the first case the speech material will be used in fully unsupervised way, while in the second one a small quantity of speech will be automatically selected and then manually transcribed. The effectiveness of the aproach is measured in terms of reduction of word (sentence) error rate, on a test set disjoint from the retraining data. Tasks considered here concern connected digits and numberplates (basically alphadigits and numbers). The idea consists in retraining the acoustic models by adding to the "baseline" training set only a subset of the newly acquired speech material, obtained discarding the "worst" part of the speech data. This method allows, with few or none manual transcriptions, to obtain significant improvements in recognition accuracy, avoiding to manually transcribe large amounts of speech data
2002
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/565
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact