In this paper we propose a technique for combining hypothe- ses generated in a multi-microphone setting, which exploits complementarity and collective agreement among ASR out- puts of different channels. The technique draws upon the information encoded in the available set of word lattices. As a first step, we identify word boundaries in which a compre- hensive inter-channel agreement is found; then, these bound- aries are used to reduce the global hypothesis search space. Global word posterior probabilities are estimated for the can- didate words associated to each of the bounded segments. As a result, a single combined confusion network is gener- ated from the multiple lattices. This approach offers a novel perspective to state of the art solutions based on confusion network combination. Promising results were obtained from an experimental evaluation in a simulated domestic environ- ment equipped with a distributed microphone network. The development and test sets were simulated using real impulse responses estimated for a large set of microphone-speaker position pairs.

Word boundary agreementto combine multi-microphone hypotheses in distant speech recognition

Guerrero Flores, Cristina Maritza;Omologo, Maurizio
2014-01-01

Abstract

In this paper we propose a technique for combining hypothe- ses generated in a multi-microphone setting, which exploits complementarity and collective agreement among ASR out- puts of different channels. The technique draws upon the information encoded in the available set of word lattices. As a first step, we identify word boundaries in which a compre- hensive inter-channel agreement is found; then, these bound- aries are used to reduce the global hypothesis search space. Global word posterior probabilities are estimated for the can- didate words associated to each of the bounded segments. As a result, a single combined confusion network is gener- ated from the multiple lattices. This approach offers a novel perspective to state of the art solutions based on confusion network combination. Promising results were obtained from an experimental evaluation in a simulated domestic environ- ment equipped with a distributed microphone network. The development and test sets were simulated using real impulse responses estimated for a large set of microphone-speaker position pairs.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/250433
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact