The paper investigates the integration of Heteroscedastic Linear Discriminant Analysis (HLDA) into adaptively trained speech recognizers. Two different approaches are compared: the first is a variant of CMLLR-SAT, the second is based on our previously introduced method Constrained Maximum-Likelihood Speaker Normalization (CMLSN). For the latter both HLDA projection and speaker-specific transformations for normalization are estimated w. r. t. a set of simple target-models. It is investigated if additional robustness can be achieved by estimating HLDA on normalized data. Experimental results are provided for a broadcast news task and a collection of parliamentary speeches. We show that the proposed methods lead to relative reductions in word error rate (WER) of 8% over an adapted baseline system that already includes an HLDA transform. The best performance for both tasks is achieved for the algorithm that is based on CMLSN. When compared to the combination of HLDA and CMLLR-SAT, this method leads to a considerable reduction in computational effort and to a significantly lower WER.
File in questo prodotto:
Non ci sono file associati a questo prodotto.