This paper proposes a new robust method to perform a multiple TDOA estimation in order to solve the permutation problem in frequency-domain Blind Source Separation. According to the acoustic propagation model, in frequency-domain, each separation matrix can be represented with a set of states associated with each source. A novel transform of the states is introduced which is independent of the aliasing and of the permutations and is able to perform a joint estimation of multiple TDOAs. We show that such a transform generalizes the GCCPHAT for multiple sources and at the same time generates envelopes with clear peaks corresponding to the maximum likelihood TDOAs. By means of the propagation model, the permutation problem is solved using the estimated TDOAs. Experimental results show that the proposed approach allows one to separate two speakers, using very short utterances (0.5- 1s), in highly reverberant environment (T60 = 700ms) even with widely-spaced microphones.
A novel robust solution to the permutation problem based on a joint multiple TDOA estimation
Nesta, Francesco;Omologo, Maurizio;Svaizer, Piergiorgio
2008-01-01
Abstract
This paper proposes a new robust method to perform a multiple TDOA estimation in order to solve the permutation problem in frequency-domain Blind Source Separation. According to the acoustic propagation model, in frequency-domain, each separation matrix can be represented with a set of states associated with each source. A novel transform of the states is introduced which is independent of the aliasing and of the permutations and is able to perform a joint estimation of multiple TDOAs. We show that such a transform generalizes the GCCPHAT for multiple sources and at the same time generates envelopes with clear peaks corresponding to the maximum likelihood TDOAs. By means of the propagation model, the permutation problem is solved using the estimated TDOAs. Experimental results show that the proposed approach allows one to separate two speakers, using very short utterances (0.5- 1s), in highly reverberant environment (T60 = 700ms) even with widely-spaced microphones.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.