An Interesting Property of LPCs for Sonorant Vs Fricative Discrimination

Linear prediction (LP) technique estimates an optimum all-pole filter of a given order for a frame of speech signal. The coefficients of the all-pole filter, 1/A(z) are referred to as LP coefficients (LPCs). The gain of the inverse of the all-pole filter, A(z) at z = 1, i.e, at frequency = 0, A(1) corresponds to the sum of LPCs, which has the property of being lower (higher) than a threshold for the sonorants (fricatives). When the inverse-tan of A(1), denoted as T(1), is used a feature and tested on the sonorant and fricative frames of the entire TIMIT database, an accuracy of 99.07% is obtained. Hence, we refer to T(1) as sonorant-fricative discrimination index (SFDI). This property has also been tested for its robustness for additive white noise and on the telephone quality speech of the NTIMIT database. These results are comparable to, or in some respects, better than the state-of-the-art methods proposed for a similar task. Such a property may be used for segmenting a speech signal or for non-uniform frame-rate analysis.

[1]  E. Puschita,et al.  An Improved Method for Automatic Classification of Speech , 2006, 2006 IEEE International Conference on Automation, Quality and Testing, Robotics.

[2]  B. Atal,et al.  Speech analysis and synthesis by linear prediction of the speech wave. , 1971, The Journal of the Acoustical Society of America.

[3]  A G Ramakrishnan,et al.  Detection of the closure-burst transitions of stops and affricates in continuous speech using the plosion index. , 2014, The Journal of the Acoustical Society of America.

[4]  Sara H. Basson,et al.  NTIMIT: a phonetically balanced, continuous speech, telephone bandwidth speech database , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[5]  Joseph P. Campbell,et al.  Voiced/Unvoiced classification of speech with applications to the U.S. government LPC-10E algorithm , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  A. Juneja,et al.  Segmentation of continuous speech using acoustic-phonetic parameters and statistical learning , 2002, Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP '02..

[7]  Lawrence R. Rabiner,et al.  A pattern recognition approach to voiced-unvoiced-silence classification with applications to speech recognition , 1976 .

[8]  Dhany Arifianto,et al.  Dual Parameters for Voiced-Unvoiced Speech Signal Determination , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[9]  Simon King,et al.  Detection of phonological features in continuous speech using neural networks , 2000, Comput. Speech Lang..

[10]  Lawrence R. Miner A Pattern Recognition Approach to Voiced-Unvoiced-Silence Classification with Applications to Speech Recognition , 1976 .

[11]  Sharlene A. Liu,et al.  Landmark detection for distinctive feature-based speech recognition , 1996 .

[12]  Keikichi Hirose,et al.  Adaptive thresholding approach for robust voiced/unvoiced classification , 2011, 2011 IEEE International Symposium of Circuits and Systems (ISCAS).

[13]  Richard Lippmann,et al.  Speech recognition by machines and humans , 1997, Speech Commun..

[14]  J. Makhoul,et al.  Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.

[15]  Wim A. van Dommelen,et al.  The Integration of Phonetic Knowledge in Speech Technology (Text, Speech and Language Technology) , 2006 .

[16]  Mark Hasegawa-Johnson,et al.  Landmark-based speech recognition: report of the 2004 Johns Hopkins summer workshop , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[17]  Partha Niyogi,et al.  The voicing feature for stop consonants: recognition experiments with continuously spoken alphabets , 2003, Speech Commun..

[18]  Bayya Yegnanarayana,et al.  Voiced/Nonvoiced Detection Based on Robustness of Voiced Epochs , 2010, IEEE Signal Processing Letters.

[19]  Douglas D. O'Shaughnessy,et al.  Voiced-Unvoiced-Silence Speech Sound Classification Based on Unsupervised Learning , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[20]  Philipos C. Loizou,et al.  Voiced/unvoiced speech discrimination in noise using Gabor atomic decomposition , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[21]  Wei-Ping Zhu,et al.  A multifeature voiced/unvoiced decision algorithm for noisy speech , 2006, 2006 IEEE International Symposium on Circuits and Systems.

[22]  Kenneth N Stevens,et al.  Toward a model for lexical access based on acoustic landmarks and distinctive features. , 2002, The Journal of the Acoustical Society of America.

[23]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[24]  J. Markel Digital inverse filtering-a new tool for formant trajectory estimation , 1972 .

[25]  Ariel Salomon,et al.  Detection of speech landmarks: use of temporal information. , 2004, The Journal of the Acoustical Society of America.