Analysis of Closing-to-Opening Phase Ratio in Top-to-Bottom Glottal Pulse Segmentation for Psychological Stress Detection

This paper is focused on investigating the differences in glottal pulses estimated by two algorithms; Direct Inverse Filtering (DIF) and Iterative and Adaptive Inverse Filtering (IAIF) for normal and stressed speech. Individual glottal pulses are mined from recorded speech signal and then normalized in two dimensions. Each normalized pulse is divided into a closing and opening phase and further segmented into n‑ percentage sectors in Top-To-Bottom (TTB) amplitude domain. Three parameters, the kurtosis, skewness and pulse area, as well as their Closing-To-Opening phase ratios, are analysed. Designed GMM classifier is trained on speakers from Czech ExamStress database a further applied on other part of ExamStress database and also for English database SUSAS to investigate the independency of presented approach on spoken language and speech signal quality. The results achieved by DIF indicate independency on language and records quality (contrary to methods using IAIF). The best n‑ percentage sectors in the TTB segments can be seen between 5 % and 40 %. In this case, methods based on DIF reached a psychological stress recognition efficiency of 88.5 % in average. The average stress detection efficiency of methods based on IAIF approached 73.3 %. DOI: http://dx.doi.org/10.5755/j01.eie.22.5.16348

[1]  J. Liljencrants,et al.  Dept. for Speech, Music and Hearing Quarterly Progress and Status Report a Four-parameter Model of Glottal Flow , 2022 .

[2]  Miroslav Stanek,et al.  Finding the Most Uniform Changes in Vowel Polygon Caused by Psychological Stress , 2015 .

[3]  Miroslav Stanek,et al.  Psychological Stress Detection in Speech Using Return-to-opening Phase Ratios in Glottis , 2015 .

[4]  Daniel Gatica-Perez,et al.  StressSense: detecting stress in unconstrained acoustic environments using smartphones , 2012, UbiComp.

[5]  Alex W. Stedmon,et al.  Acoustic correlates of speech when under stress: Research, methods and future directions , 2011 .

[6]  Constantine Kotropoulos,et al.  Emotional speech recognition: Resources, features, and methods , 2006, Speech Commun..

[7]  R. K. Sharma,et al.  Estimation and Statistical Analysis of Human Voice Parameters to Investigate the Influence of Psychological Stress and to Determine the Vocal Tract Transfer Function of an Individual , 2014, J. Comput. Networks Commun..

[8]  Mansour Sheikhan,et al.  Emotion recognition improvement using normalized formant supplementary features by hybrid of DTW-MLP-GMM model , 2012, Neural Computing and Applications.

[9]  John H. L. Hansen,et al.  Getting started with SUSAS: a speech under simulated and actual stress database , 1997, EUROSPEECH.

[10]  João Paulo Papa,et al.  Spoken emotion recognition through optimum-path forest classification using glottal features , 2010, Comput. Speech Lang..

[11]  Fakhri Karray,et al.  Survey on speech emotion recognition: Features, classification schemes, and databases , 2011, Pattern Recognit..

[12]  Ladislav Polak,et al.  Algorithms for vowel recognition in fluent speech based on formant positions , 2013, 2013 36th International Conference on Telecommunications and Signal Processing (TSP).

[13]  J. Canny,et al.  AMMON : A Speech Analysis Library for Analyzing Affect , Stress , and Mental Health on Mobile Phones , 2011 .

[14]  Sazali Yaacob,et al.  Improved Emotion Recognition Using Gaussian Mixture Model and Extreme Learning Machine in Speech and Glottal Signals , 2015 .