This paper describes experiments in using speech data, collected by means of commercial services, in order to perform unsupervised or nearly unsupervised acoustic model retraining. In the first case the speech material will be used in fully unsupervised way, while in the second one a small quantity of speech will be automatically selected and then manually transcribed. The effectiveness of the aproach is measured in terms of reduction of word (sentence) error rate, on a test set disjoint from the retraining data. Tasks considered here concern connected digits and numberplates (basically alphadigits and numbers). The idea consists in retraining the acoustic models by adding to the "baseline" training set only a subset of the newly acquired speech material, obtained discarding the "worst" part of the speech data. This method allows, with few or none manual transcriptions, to obtain significant improvements in recognition accuracy, avoiding to manually transcribe large amounts of speech data
Task-Oriented Unsupervised / Nearly Unsupervised Acoustic Model Retraining
Facco, Andrea;Gretter, Roberto
2002-01-01
Abstract
This paper describes experiments in using speech data, collected by means of commercial services, in order to perform unsupervised or nearly unsupervised acoustic model retraining. In the first case the speech material will be used in fully unsupervised way, while in the second one a small quantity of speech will be automatically selected and then manually transcribed. The effectiveness of the aproach is measured in terms of reduction of word (sentence) error rate, on a test set disjoint from the retraining data. Tasks considered here concern connected digits and numberplates (basically alphadigits and numbers). The idea consists in retraining the acoustic models by adding to the "baseline" training set only a subset of the newly acquired speech material, obtained discarding the "worst" part of the speech data. This method allows, with few or none manual transcriptions, to obtain significant improvements in recognition accuracy, avoiding to manually transcribe large amounts of speech dataI documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.