Physical task stress and speaker variability in voice quality

The presence of physical task stress induces changes in the speech production system which in turn produces changes in speaking behavior. This results in measurable acoustic correlates including changes to formant center frequencies, breath pause placement, and fundamental frequency. Many of these changes are due to the subject’s internal competition between speaking and breathing during the performance of the physical task, which has a corresponding impact on muscle control and airflow within the glottal excitation structure as well as vocal tract articulatory structure. This study considers the effect of physical task stress on voice quality. Three signal processing-based values which include (i) the normalized amplitude quotient (NAQ), (ii) the harmonic richness factor (HRF), and (iii) the fundamental frequency are used to measure voice quality. The effects of physical stress on voice quality depend on the speaker as well as the specific task. While some speakers do not exhibit changes in voice quality, a subset exhibits changes in NAQ and HRF measures of similar magnitude to those observed in studies of soft, loud, and pressed speech. For those speakers demonstrating voice quality changes, the observed changes tend toward breathy or soft voicing as observed in other studies. The effect of physical stress on the fundamental frequency is correlated with the effect of physical stress on the HRF (r = −0.34) and the NAQ (r = −0.53). Also, the inter-speaker variation in baseline NAQ is significantly higher than the variation in NAQ induced by physical task stress. The results illustrate systematic changes in speech production under physical task stress, which in theory will impact subsequent speech technology such as speech recognition, speaker recognition, and voice diarization systems.

[1]  K E Cummings,et al.  Analysis of the glottal excitation of emotionally styled and stressed speech. , 1995, The Journal of the Acoustical Society of America.

[2]  G. de Krom,et al.  Some spectral correlates of pathological breathy and rough voice quality for different types of vowel fragments. , 1995, Journal of speech and hearing research.

[3]  S. Linville,et al.  Vocal tract resonance analysis of aging voice using long-term average spectra. , 2001, Journal of voice : official journal of the Voice Foundation.

[4]  John H. L. Hansen,et al.  Vowel Context and Speaker Interactions Influencing Glottal Open Quotient and Formant Frequency Shifts in Physical Task Stress , 2011, INTERSPEECH.

[5]  Mohler Jg Quantification of dyspnea confirmed by voice pitch analysis. , 1982 .

[6]  John H. L. Hansen,et al.  Evaluation of speech under stress and emotional conditions , 1987 .

[7]  P. Alku,et al.  A comparison of glottal voice source quantification parameters in breathy, normal and pressed phonation of female and male speakers. , 1996, Folia phoniatrica et logopaedica : official organ of the International Association of Logopedics and Phoniatrics.

[8]  Elliot Moore,et al.  A performance assessment of objective measures for evaluating the quality of glottal waveform estimates , 2008, Speech Commun..

[9]  Robert F. Orlikoff,et al.  Voice Production during a Weightlifting and Support Task , 2008, Folia Phoniatrica et Logopaedica.

[10]  H K Schutte,et al.  Relations between depressed mood and vocal parameters before, during and after sleep deprivation: a circadian rhythm study. , 1990, Journal of affective disorders.

[11]  John H. L. Hansen,et al.  ICARUS: Source generator based real-time recognition of speech in noisy stressful and Lombard effect environments , 1995, Speech Commun..

[12]  J. García-López,et al.  Relationship Between the Talk Test and Ventilatory Thresholds in Well-Trained Cyclists , 2013, Journal of strength and conditioning research.

[13]  John H L Hansen,et al.  Analysis of the effects of physical task stress on the speech signal. , 2011, The Journal of the Acoustical Society of America.

[14]  V A Convertino,et al.  A comparison of heart rate methods for predicting endurance training intensity. , 1975, Medicine and science in sports.

[15]  Bin Yang,et al.  Cascaded emotion classification via psychological emotion dimensions using a large set of voice quality parameters , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[16]  Christer Gobl,et al.  Acoustic characteristics of voice quality , 1992, Speech Commun..

[17]  John H. L. Hansen,et al.  Nonlinear feature based classification of speech under stress , 2001, IEEE Trans. Speech Audio Process..

[18]  Hiroshi Ishiguro,et al.  A Method for Automatic Detection of Vocal Fry , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[19]  R. Lansing,et al.  Speaking-related dyspnea in healthy adults. , 2007, Journal of speech, language, and hearing research : JSLHR.

[20]  Thierry Dutoit,et al.  Causal-anticausal decomposition of speech using complex cepstrum for glottal source estimation , 2011, Speech Commun..

[21]  Elliot Moore,et al.  Investigating Glottal Parameters and Teager Energy Operators in Emotion Recognition , 2011, ACII.

[22]  John H. L. Hansen,et al.  Source generator equalization and enhancement of spectral properties for robust speech recognition in noise and stress , 1995, IEEE Trans. Speech Audio Process..

[23]  Heather M. Koblick Effects of simultaneous exercise and speech tasks on the perception of effort and vocal measures in aerobic instructors , 2004 .

[24]  Carlos Toshinori Ishi A new acoustic measure for aspiration noise detection , 2004, INTERSPEECH.

[25]  H. Alessio,et al.  Ventilation and speech characteristics during submaximal aerobic exercise. , 2008, Journal of speech, language, and hearing research : JSLHR.

[26]  John H. L. Hansen,et al.  Analysis and compensation of stressed and noisy speech with application to robust automatic recognition , 1988 .

[27]  Anne-Maria Laukkanen,et al.  Changes in voice during a day in normal voices without vocal loading , 2002, Logopedics, phoniatrics, vocology.

[28]  A T Welford,et al.  Stress and performance. , 1973, Ergonomics.

[29]  R. Orlikoff,et al.  The effect of the heartbeat on vocal fundamental frequency perturbation. , 1989, Journal of speech and hearing research.

[30]  John H. L. Hansen,et al.  Analysis and compensation of speech under stress and noise for environmental robustness in speech recognition , 1996, Speech Commun..

[31]  J. García-López,et al.  Workload demands in professional multi-stage cycling races of varying duration , 2007, British Journal of Sports Medicine.

[32]  D G Childers,et al.  Vocal quality factors: analysis, synthesis, and perception. , 1991, The Journal of the Acoustical Society of America.

[33]  Y. Meckel,et al.  Perceived speech difficulty during exercise and its relation to exercise intensity and physiological responses , 2004, European Journal of Applied Physiology.

[34]  L. Gavidia-Ceballos,et al.  Direct speech feature estimation using an iterative EM algorithm for vocal fold pathology detection , 1996, IEEE Transactions on Biomedical Engineering.

[35]  K. Strohl,et al.  The response of the nasal airway to exercise. , 1987, The American review of respiratory disease.

[36]  John H. L. Hansen,et al.  Speech Under Stress: Analysis, Modeling and Recognition , 2007, Speaker Classification.

[37]  J.H.L. Hansen,et al.  UT-Scope: Speech under Lombard Effect and Cognitive Stress , 2007, 2007 IEEE Aerospace Conference.

[38]  John H. L. Hansen,et al.  TEO-based speaker stress assessment using hybrid classification and tracking schemes , 2012, Int. J. Speech Technol..

[39]  John H. L. Hansen,et al.  Glottal Waveform Analysis of Physical Task Stress Speech , 2012, INTERSPEECH.

[40]  Mark A. Clements,et al.  Analysis of glottal waveforms across stress styles , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[41]  John H. L. Hansen,et al.  Alternate sensor based speech systems for speaker assessment and robust human communication , 2010 .

[42]  P. Alku,et al.  Normalized amplitude quotient for parametrization of the glottal flow. , 2002, The Journal of the Acoustical Society of America.

[43]  Patrick A. Naylor,et al.  Detection of Glottal Closure Instants From Speech Signals: A Quantitative Review , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[44]  Douglas A. Reynolds,et al.  Modeling of the glottal flow derivative waveform with application to speaker identification , 1999, IEEE Trans. Speech Audio Process..

[45]  Y. Meckel,et al.  The effects of speech production on physiologic responses during submaximal exercise. , 2002, Medicine and science in sports and exercise.

[46]  Nick Campbell CHANGES IN VOICE QUALITY DUE TO SOCIAL CONDITIONS , 2007 .

[47]  Mike Brookes,et al.  Voice source cepstrum coefficients for speaker identification , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[48]  John H. L. Hansen,et al.  A comparative study of traditional and newly proposed features for recognition of speech under stress , 2000, IEEE Trans. Speech Audio Process..

[49]  Jamie Eileen Luketic THE EFFECT OF INSPIRATORY MUSCLE STRENGTH TRAINING ON VENTILATION AND DYSPNEA DURING SIMULTANEOUS EXERCISE AND SPEECH , 2007 .

[50]  N. Campbell,et al.  Voice Quality : the 4 th Prosodic Dimension , 2004 .

[51]  K. Stevens,et al.  Emotions and speech: some acoustical correlates. , 1972, The Journal of the Acoustical Society of America.

[52]  Eliathamby Ambikairajah,et al.  Glottal features for speech-based cognitive load classification , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[53]  John H. L. Hansen,et al.  Evaluation of acoustic correlates of speech under stress for robust speech recognition , 1989, Proceedings of the Fifteenth Annual Northeast Bioengineering Conference.

[54]  Abeer Alwan,et al.  Joint Robust Voicing Detection and Pitch Estimation Based on Residual Harmonics , 2019, INTERSPEECH.

[55]  J. Perkell,et al.  Glottal airflow and transglottal air pressure measurements for male and female speakers in soft, normal, and loud voice. , 1988, The Journal of the Acoustical Society of America.

[56]  John Kane,et al.  Evaluation of glottal closure instant detection in a range of voice qualities , 2013, Speech Commun..

[57]  Joseph S. Perkell,et al.  Erratum: ‘‘Glottal airflow and transglottal air pressure measurements for male and female speakers in soft, normal, and loud voice’’ [J. Acoust. Soc. Am. 84, 511–529 (1988)] , 1989 .

[58]  John H. L. Hansen,et al.  Discrete-Time Processing of Speech Signals , 1993 .

[59]  Ailbhe Ní Chasaide,et al.  The role of voice quality in communicating emotion, mood and attitude , 2003, Speech Commun..

[60]  John H. L. Hansen,et al.  Analysis and classification of speech mode: whispered through shouted , 2007, INTERSPEECH.

[61]  John H. L. Hansen,et al.  Robust Emotional Stressed Speech Detection Using Weighted Frequency Subbands , 2011, EURASIP J. Adv. Signal Process..

[62]  J. Doust,et al.  The limitation of exercise ventilation during speech. , 1981, Respiration physiology.

[63]  Bernd Johannes,et al.  Non-linear function model of voice pitch dependency on physical and mental load , 2007, European Journal of Applied Physiology.

[64]  Hirofumi Tanaka,et al.  Age-predicted maximal heart rate revisited. , 2001, Journal of the American College of Cardiology.

[65]  John H. L. Hansen,et al.  Nonlinear analysis and classification of speech under stressed conditions , 1994 .

[66]  J G Mohler Quantification of dyspnea confirmed by voice pitch analysis. , 1982, Bulletin europeen de physiopathologie respiratoire.

[67]  Anthony L. Webster,et al.  INTENSITY OF PHYSICAL ACTIVITY AND THE "TALK TEST": A Brief Review and Practical Application , 2008 .

[68]  John H. L. Hansen,et al.  Analysis and perception of speech under physical task stress , 2008, INTERSPEECH.