This paper tackles the front-back disambiguity problem in speaker localization when the audio signals are captured by a symmetric microphone array. To this end, a deep neural network is proposed with an attention-based mechanism designed to assign different weights to features obtained from individual microphones. For support, a real dataset with synchronized multichannel audio signals captured by a large linear microphone array is introduced, along with manual annotations. The experimental results demonstrate the effectiveness of the proposed method over the other approaches. In particular, more than 50% reduction in Equal Error Rate (EER) is achieved when comparing with the single-channel case. The designed multi-channel self-attention mechanism also brings further improvements. The dataset and source code will be released.
Speaker front‐back disambiguity using multi‐channel speech signals
Qian, Xinyuan;Brutti, Alessio
2022-01-01
Abstract
This paper tackles the front-back disambiguity problem in speaker localization when the audio signals are captured by a symmetric microphone array. To this end, a deep neural network is proposed with an attention-based mechanism designed to assign different weights to features obtained from individual microphones. For support, a real dataset with synchronized multichannel audio signals captured by a large linear microphone array is introduced, along with manual annotations. The experimental results demonstrate the effectiveness of the proposed method over the other approaches. In particular, more than 50% reduction in Equal Error Rate (EER) is achieved when comparing with the single-channel case. The designed multi-channel self-attention mechanism also brings further improvements. The dataset and source code will be released.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.