Comparative Analysis of Spectral and Cepstral Feature Extraction Techniques for Phoneme Modelling

Phoneme parameter extraction framework based on spectral and cepstral parameters is proposed. Using this framework, the phoneme signal is divided into frames and Hamming window is used. The performances are evaluated for recognition of Lithuanian vowel and semivowel phonemes. Different feature sets without noise as well as at different level of noise are considered. Two classical machine learning methods (Naive Bayes and Support Vector Machine) are used for classifying each problem, separately. The experiment results show that cepstral parameters give higher accuracies than spectral parameters. Moreover, cepstral parameters give better performance compared to spectral parameters in noisy conditions.

[1]  Virginija Simonyte,et al.  Developing Models of Lithuanian Speech Vowels and Semivowels , 2014, Informatica.

[2]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[3]  Junichi Hori,et al.  Classification of silent speech using support vector machine and relevance vector machine , 2014, Appl. Soft Comput..

[4]  Björn W. Schuller,et al.  The Geneva Minimalistic Acoustic Parameter Set (GeMAPS) for Voice Research and Affective Computing , 2016, IEEE Transactions on Affective Computing.

[5]  Björn W. Schuller,et al.  Paralinguistics in speech and language - State-of-the-art and the challenge , 2013, Comput. Speech Lang..

[6]  Athanasios Katsamanis,et al.  A hierarchical framework for modeling multimodality and emotional evolution in affective dialogs , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Bozena Kostek,et al.  Voiceless Stop Consonant Modelling and Synthesis Framework Based on Miso Dynamic System , 2017 .

[8]  P. Vasuki,et al.  Improving emotion recognition from speech using sensor fusion techniques , 2012, TENCON 2012 IEEE Region 10 Conference.

[9]  Bozena Kostek,et al.  Examining feature vector for phoneme recognition , 2017, 2017 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT).

[10]  Andrzej Czyzewski,et al.  Comparative Study of Self-Organizing Maps vs Subjective Evaluation of Quality of Allophone Pronunciation for Non-native English Speakers , 2017 .

[11]  Ronald W. Schafer,et al.  Introduction to Digital Speech Processing , 2007, Found. Trends Signal Process..

[12]  Deividas Eringis,et al.  Improving Speech Recognition Rate through Analysis Parameters , 2014 .

[13]  Sotiris B. Kotsiantis,et al.  Supervised Machine Learning: A Review of Classification Techniques , 2007, Informatica.

[14]  Kenneth Sundaraj,et al.  A comparative study of the svm and k-nn machine learning algorithms for the diagnosis of respiratory pathologies using pulmonary acoustic signals , 2014, BMC Bioinformatics.

[15]  Seyed Omid Sadjadi,et al.  Nearest neighbor based i-vector normalization for robust speaker recognition under unseen channel conditions , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16]  K. Sreenivasa Rao,et al.  Speech Processing in Mobile Environments , 2014, Springer Briefs in Electrical and Computer Engineering.

[17]  M. Bojanic,et al.  Application of neural networks in emotional speech recognition , 2012, 11th Symposium on Neural Network Applications in Electrical Engineering.