Corrective algorithm for esophageal voice cycle detection

In this paper a corrective solution for more accurate esophageal voice cycle marks detection is presented. While the algorithm developed in a previous work adapts the MDVP voice period marks to esophageal voice, it shows lacks when analyzing unsteady |a| vowel samples and with high levels of noise. The proposed solution is aimed to iteratively correct the parameters (n min and threshold) used in the previous algorithm. The corrective system filters the noise of voice recordings and then uses the output of the previous algorithm to calculate the most reliable fundamental frequency, based on the most stable part of the given signal. The results were acquired by comparing the pitch values of the previous algorithm with and without the proposed solution and the calculating the error against the real value. The corrective solution presents a considerable improvement in the accuracy of the voice cycle detection algorithm by reducing the standard deviation of the error dataset from 4.591 to 0.593 and the mean error value from 2.459 down to 0.370. Furthermore, it reduces up to a 54% all errors above 1%.

[1]  Begoña García Zapirain,et al.  Objective characterization of oesophageal voice supporting medical diagnosis, rehabilitation and monitoring , 2009, Comput. Biol. Medicine.

[2]  Jau-Hung Chen,et al.  Pitch Marking Based on an Adaptable Filter and a Peak-Valley Estimation Method , 2001, ROCLING/IJCLCLP.

[3]  M. Singer,et al.  Voice rehabilitation after total laryngectomy. , 1983, The Journal of otolaryngology.

[4]  Paul C. Bagshaw,et al.  Enhanced pitch tracking and the processing of F0 contours for computer aided intonation teaching , 1993, EUROSPEECH.

[5]  Begoña García Zapirain,et al.  Esophageal voices: glottal flow restoration , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[6]  Vincent Gibiat Phase space representations of acoustical musical signals , 1988 .

[7]  Hideki Kawahara,et al.  YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.

[8]  B. Kedem,et al.  Spectral analysis and discrimination by zero-crossings , 1986, Proceedings of the IEEE.

[9]  S. Hamid Nawab,et al.  Improved musical pitch tracking using principal decomposition analysis , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  J. Flanagan Speech Analysis, Synthesis and Perception , 1971 .

[11]  F. Albers,et al.  Voice rehabilitation after total laryngectomy. , 1992, Acta oto-rhino-laryngologica Belgica.

[12]  Gernot Kubin,et al.  Poincaré pitch marks , 2006, Speech Commun..

[13]  Peter M. Todd,et al.  A Neural Network Model for Pitch Perception , 2003 .

[14]  J. Vicente,et al.  Formants Measurement for Esophageal Speech Using Wavelets with Band and Resolution Adjustment , 2006, 2006 IEEE International Symposium on Signal Processing and Information Technology.

[15]  B. Keith Jenkins,et al.  A Neural Network Model for Pitch Perception , 1989 .