On the use of a weighted autocorrelation based fundamental frequency estimation for a multidimensional speech input

The problem of computing the fundamental frequency F0 in an accurate way is a known and still partially unsolved problem, especially given a noisy speech input. In this work, a distanttalking scenario is addressed, where a distributed microphone network provides multi-channel input sequences to process for speaker modeling purposes. Given this context, one may process in an independent way each channel and then apply a majority vote or other fusion methods. Otherwise, the redundancy across the channels can be exploited jointly by processing the different signals to obtain a more reliable and robust F0 estimation. The paper investigates the use of a multi-channel version of a Weighted Autocorrelation(WAUTOC)-based F0 estimation technique. A postprocessing corrective step is introduced to improve the resulting F0 accuracy. Experiments conducted on a real database show the advantages and the robustness of the proposed method in extracting the fundamental frequency with no regard about the microphone and talker position as well as the head orientation.