Classification of speech under stress using harmonic peak to energy ratio

A new feature, harmonic peak to energy ratio (HPER), is proposed for analysis and classification of speech under stress.Significance of the HPER feature is explored using statistical analysis.Binary-cascade multi-class classification strategy is used based on the valence-activation descriptors of different stress conditions.The performance is evaluated using Support Vector Machine (SVM) classifier. Display Omitted This paper explores the analysis and classification of speech under stress using a new feature, harmonic peak to energy ratio (HPER). The HPER feature is computed from the Fourier spectra of speech signal. The harmonic amplitudes are closely related to breathiness levels of speech. These breathiness levels may be different for different stress conditions. The statistical analysis shows that the proposed HPER feature is useful in characterization of various stress classes. Support Vector Machine (SVM) classifier with binary cascade strategy is used to evaluate the performance of the HPER feature using simulated stressed speech database (SSD). The performance results show that the HPER feature successfully characterizes different stress conditions. The performance of the HPER feature is compared with the mel frequency cepstral coefficients (MFCC), the Linear prediction coefficients (LPC) and the Teager-Energy-Operator (TEO) based Critical Band TEO Autocorrelation Envelope (TEO-CB-Auto-Env) features. The proposed HPER feature outperforms the MFCC, LPC and TEO-CB-Auto-Env features. The combination of the HPER feature with the MFCC feature further increases the system performance.

[1]  S. Ramamohan,et al.  Sinusoidal model-based analysis and classification of stressed speech , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  J. Hillenbrand,et al.  Acoustic correlates of breathy vocal quality: dysphonic voices and continuous speech. , 1996, Journal of speech and hearing research.

[3]  Kazuya Takeda,et al.  Modeling of Physical Characteristics of Speech under Stress , 2015, IEEE Signal Processing Letters.

[4]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[5]  Constantine Kotropoulos,et al.  Emotional speech recognition: Resources, features, and methods , 2006, Speech Commun..

[6]  Carlos Busso,et al.  Compensating for speaker or lexical variabilities in speech for emotion recognition , 2014, Speech Commun..

[7]  Yanbin Li,et al.  Topology Inference With Network Tomography Based on t-Test , 2014, IEEE Communications Letters.

[8]  Günes Karabulut-Kurt,et al.  Perceptual audio features for emotion detection , 2012, EURASIP J. Audio Speech Music. Process..

[9]  Eduardo Castillo Guerra,et al.  Automatic Modeling of Acoustic Perception of Breathiness in Pathological Voices , 2009, IEEE Transactions on Biomedical Engineering.

[10]  Lawrence R. Rabiner,et al.  On the use of autocorrelation analysis for pitch detection , 1977 .

[11]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[12]  Mahdi Eftekhari,et al.  A two-level multi-gene genetic programming model for speech quality prediction in Voice over Internet Protocol systems , 2016, Comput. Electr. Eng..

[13]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[14]  Ning An,et al.  Speech Emotion Recognition Using Fourier Parameters , 2015, IEEE Transactions on Affective Computing.

[15]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[16]  Hongming Zhou,et al.  Extreme Learning Machine for Regression and Multiclass Classification , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[17]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[18]  Rosângela Coelho,et al.  Time-Frequency Feature and AMS-GMM Mask for Acoustic Emotion Classification , 2014, IEEE Signal Processing Letters.

[19]  Christian Müller Speaker Classification II, Selected Projects , 2007, Speaker Classification.

[20]  Yasser Shekofteh,et al.  Improvement of automatic speech recognition systems via nonlinear dynamical features evaluated from the recurrence plot of speech signals , 2017, Comput. Electr. Eng..

[21]  John H. L. Hansen,et al.  A comparative study of traditional and newly proposed features for recognition of speech under stress , 2000, IEEE Trans. Speech Audio Process..

[22]  John H. L. Hansen,et al.  Nonlinear feature based classification of speech under stress , 2001, IEEE Trans. Speech Audio Process..

[23]  S. R. Mahadeva Prasanna,et al.  Spectral slope based analysis and classification of stressed speech , 2011, Int. J. Speech Technol..

[24]  Kemal Polat,et al.  A new feature selection method on classification of medical datasets: Kernel F-score feature selection , 2009, Expert Syst. Appl..

[25]  Frank K. Soong,et al.  On the use of instantaneous and transitional spectral information in speaker recognition , 1988, IEEE Trans. Acoust. Speech Signal Process..

[26]  Fabio Paternò,et al.  Speaker-independent emotion recognition exploiting a psychologically-inspired binary cascade classification schema , 2012, International Journal of Speech Technology.

[27]  G. Mahendran,et al.  Investigation of the severity level of diabetic retinopathy using supervised classifier algorithms , 2015, Comput. Electr. Eng..

[28]  Sunil Kumar Kopparapu,et al.  Recognition of subsampled speech using a modified Mel filter bank , 2013, Comput. Electr. Eng..

[29]  A. G. Ramakrishnan,et al.  Epoch Extraction Based on Integrated Linear Prediction Residual Using Plosion Index , 2013, IEEE Transactions on Audio, Speech, and Language Processing.