A comparison of estimated and MAP-predicted formants and fundamental frequencies with a speech reconstruction application

This work compares the accuracy of fundamental frequency and formant frequency estimation methods and maximum a posteriori (MAP) prediction from MFCC vectors with hand-corrected references. Five fundamental frequency estimation methods are compared to fundamental frequency prediction from MFCC vectors in both clean and noisy speech. Similarly, three formant frequency estimation and prediction methods are compared. An analysis of estimation and prediction accuracy shows that prediction from MFCCs provides the most accurate voicing classification across clean and noisy speech. On clean speech, fundamental frequency estimation outperforms prediction from MFCCs, but as noise increases the performance of prediction is significantly more robust than estimation. Formant frequency prediction is found to be more accurate than estimation in both clean and noisy speech. A subjective analysis of the estimation and prediction methods is also made by reconstructing speech from the acoustic features.

[1]  Xu Shao,et al.  Predicting fundamental frequency from mel-frequency cepstral coefficients to enable speech reconstruction. , 2005, The Journal of the Acoustical Society of America.

[2]  M. Ross,et al.  Average magnitude difference function pitch extractor , 1974 .

[3]  David Talkin,et al.  A Robust Algorithm for Pitch Tracking ( RAPT ) , 2005 .

[4]  Abeer Alwan,et al.  A Database of Vocal Tract Resonance Trajectories for Research in Speech Processing , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[5]  Dennis H. Klatt,et al.  Software for a cascade/parallel formant synthesizer , 1980 .

[6]  Hideki Kawahara,et al.  YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.

[7]  Saeed Vaseghi,et al.  An Investigation into the Correlation and Prediction of Acoustic Speech Features from MFCC Vectors , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[8]  George R. Doddington,et al.  An integrated pitch tracking algorithm for speech systems , 1983, ICASSP.

[9]  Qin Yan,et al.  A formant tracking LP model for speech processing , 2004, INTERSPEECH.

[10]  Saeed Vaseghi,et al.  MAP prediction of formant frequencies and voicing class from MFCC vectors in noise , 2006, Speech Commun..