In this paper we propose a technique for combining hypothe- ses generated in a multi-microphone setting, which exploits complementarity and collective agreement among ASR out- puts of different channels. The technique draws upon the information encoded in the available set of word lattices. As a first step, we identify word boundaries in which a compre- hensive inter-channel agreement is found; then, these bound- aries are used to reduce the global hypothesis search space. Global word posterior probabilities are estimated for the can- didate words associated to each of the bounded segments. As a result, a single combined confusion network is gener- ated from the multiple lattices. This approach offers a novel perspective to state of the art solutions based on confusion network combination. Promising results were obtained from an experimental evaluation in a simulated domestic environ- ment equipped with a distributed microphone network. The development and test sets were simulated using real impulse responses estimated for a large set of microphone-speaker position pairs.
Word boundary agreementto combine multi-microphone hypotheses in distant speech recognition
Guerrero Flores, Cristina Maritza;Omologo, Maurizio
2014-01-01
Abstract
In this paper we propose a technique for combining hypothe- ses generated in a multi-microphone setting, which exploits complementarity and collective agreement among ASR out- puts of different channels. The technique draws upon the information encoded in the available set of word lattices. As a first step, we identify word boundaries in which a compre- hensive inter-channel agreement is found; then, these bound- aries are used to reduce the global hypothesis search space. Global word posterior probabilities are estimated for the can- didate words associated to each of the bounded segments. As a result, a single combined confusion network is gener- ated from the multiple lattices. This approach offers a novel perspective to state of the art solutions based on confusion network combination. Promising results were obtained from an experimental evaluation in a simulated domestic environ- ment equipped with a distributed microphone network. The development and test sets were simulated using real impulse responses estimated for a large set of microphone-speaker position pairs.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.