This paper proposes a new robust method to perform a multiple TDOA estimation in order to solve the permutation problem in frequency-domain Blind Source Separation. According to the acoustic propagation model, in frequency-domain, each separation matrix can be represented with a set of states associated with each source. A novel transform of the states is introduced which is independent of the aliasing and of the permutations and is able to perform a joint estimation of multiple TDOAs. We show that such a transform generalizes the GCCPHAT for multiple sources and at the same time generates envelopes with clear peaks corresponding to the maximum likelihood TDOAs. By means of the propagation model, the permutation problem is solved using the estimated TDOAs. Experimental results show that the proposed approach allows one to separate two speakers, using very short utterances (0.5- 1s), in highly reverberant environment (T60 = 700ms) even with widely-spaced microphones.

A novel robust solution to the permutation problem based on a joint multiple TDOA estimation

Nesta, Francesco;Omologo, Maurizio;Svaizer, Piergiorgio
2008-01-01

Abstract

This paper proposes a new robust method to perform a multiple TDOA estimation in order to solve the permutation problem in frequency-domain Blind Source Separation. According to the acoustic propagation model, in frequency-domain, each separation matrix can be represented with a set of states associated with each source. A novel transform of the states is introduced which is independent of the aliasing and of the permutations and is able to perform a joint estimation of multiple TDOAs. We show that such a transform generalizes the GCCPHAT for multiple sources and at the same time generates envelopes with clear peaks corresponding to the maximum likelihood TDOAs. By means of the propagation model, the permutation problem is solved using the estimated TDOAs. Experimental results show that the proposed approach allows one to separate two speakers, using very short utterances (0.5- 1s), in highly reverberant environment (T60 = 700ms) even with widely-spaced microphones.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/8645
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact