Speaker diarization is the task of estimating “who spoke when” in a meeting. To realize accurate diarization for real meetings, we have to deal with noise, speaker overlap, reverberation, etc. In this work, we propose to model directional statistics of spatial clusters via a dictionary of probabilistic models. The dictionary is trained using spatial features of possible source locations. Observed mixtures of multiple source signals are statistically represented as the weighted sum of the trained models, where each weight defines the activity of a source associated with a spatial location or a cluster. To detect the active clusters and perform the speaker diarization, the weights are estimated by applying Bayes' rule. Furthermore, a Laplace distribution is proposed to model the background noise. The proposed method was evaluated in real meetings, and it provided high performance comparing to a baseline method.

Modeling audio directional statistics using a probabilistic dictionary for speaker diarization in real meetings

Abdelraheem, Mahmoud Fakhry Mahmoud;
2016-01-01

Abstract

Speaker diarization is the task of estimating “who spoke when” in a meeting. To realize accurate diarization for real meetings, we have to deal with noise, speaker overlap, reverberation, etc. In this work, we propose to model directional statistics of spatial clusters via a dictionary of probabilistic models. The dictionary is trained using spatial features of possible source locations. Observed mixtures of multiple source signals are statistically represented as the weighted sum of the trained models, where each weight defines the activity of a source associated with a spatial location or a cluster. To detect the active clusters and perform the speaker diarization, the weights are estimated by applying Bayes' rule. Furthermore, a Laplace distribution is proposed to model the background noise. The proposed method was evaluated in real meetings, and it provided high performance comparing to a baseline method.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/307363
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact