Robust Formant Tracking in Echoic and Noisy Environments

Despite the fact that formant extraction has been investigated for a long time it still remains a challenging task. Particularly in real-world environments, where noise and echoes are detrimental factors for speech processing, existing methods for formant extraction yield unfavorable results. Here, we present a framework for formant tracking which is specifically tailored for application in such difficult settings. Keys to our method are, firstly, an auditory inspired preprocessing which enhances formants in spectrograms and, secondly, a probabilistic scheme which estimates the joint distribution of formants. Especially the latter contributes to the robustness of our system as it naturally considers the uncertainty inherent to the speech data. We demonstrate the favorable performance of our framework by a comprehensive evaluation on a publicly available database as well as in form of an online system operating under real-world conditions.

[1]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[2]  Gunnar Fant,et al.  A note on vocal tract size factors and non-uniform f-pattern scalings , 1966 .

[3]  Ian C. Bruce,et al.  Robust Formant Tracking for Continuous Speech With Speaker Variability , 2003, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  R. Patterson,et al.  Complex Sounds and Auditory Images , 1992 .

[5]  Martin Heckmann,et al.  Speaker independent voiced-unvoiced detection evaluated in different speaking styles , 2006, INTERSPEECH.

[6]  Abeer Alwan,et al.  A Database of Vocal Tract Resonance Trajectories for Research in Speech Processing , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[7]  Jonas Beskow,et al.  Wavesurfer - an open source speech tool , 2000, INTERSPEECH.

[8]  Christian Goerick,et al.  Researching and developing a real-time infrastructure for intelligent systems - Evolution of an integrated approach , 2008, Robotics Auton. Syst..

[9]  Martin Heckmann,et al.  Combining Auditory Preprocessing and Bayesian Estimation for Robust Formant Tracking , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Martin Heckmann,et al.  Combining rate and place information for robust pitch extraction , 2007, INTERSPEECH.

[11]  Martin Heckmann,et al.  Listen to the parrot: Demonstrating the quality of online pitch and formant extraction via feature-based resynthesis , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.