The problem of computing the fundamental frequency F0 in an accurate way is a known and still partially unsolved problem, especially given a noisy speech input. In this work, a distanttalking scenario is addressed, where a distributed microphone network provides multi-channel input sequences to process for speaker modeling purposes. Given this context, one may process in an independent way each channel and then apply a majority vote or other fusion methods. Otherwise, the redundancy across the channels can be exploited jointly by processing the different signals to obtain a more reliable and robust F0 estimation. The paper investigates the use of a multi-channel version of a Weighted Autocorrelation(WAUTOC)-based F0 estimation technique. A postprocessing corrective step is introduced to improve the resulting F0 accuracy. Experiments conducted on a real database show the advantages and the robustness of the proposed method in extracting the fundamental frequency with no regard about the microphone and talker position as well as the head orientation

On the Use of a Weighted Autocorrelation Based Fundamental Frequency Estimation for a Multidimensional Speech Input

Flego, Federico;Omologo, Maurizio
2004-01-01

Abstract

The problem of computing the fundamental frequency F0 in an accurate way is a known and still partially unsolved problem, especially given a noisy speech input. In this work, a distanttalking scenario is addressed, where a distributed microphone network provides multi-channel input sequences to process for speaker modeling purposes. Given this context, one may process in an independent way each channel and then apply a majority vote or other fusion methods. Otherwise, the redundancy across the channels can be exploited jointly by processing the different signals to obtain a more reliable and robust F0 estimation. The paper investigates the use of a multi-channel version of a Weighted Autocorrelation(WAUTOC)-based F0 estimation technique. A postprocessing corrective step is introduced to improve the resulting F0 accuracy. Experiments conducted on a real database show the advantages and the robustness of the proposed method in extracting the fundamental frequency with no regard about the microphone and talker position as well as the head orientation
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/2316
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact