Investigation and Evaluation of Voice Stress Analysis Technology

Abstract : Numerous police officers and agencies have been approached in recent years by vendors touting computer-based systems capable of measuring stress in a person's voice as an indicator of deception. These systems are advertised as being cheaper, easier to use, less invasive in use, and less constrained in their operation than polygraph technology. They claim that a speaker's medical condition, age, or consumption of drugs does not affect use of their system. Voice stress analysis does not require physical attachment of the system to the speaker's body and does not require that answers be restricted to 'yes' and 'no'. Purportedly, according to some vendors, any spoken word or even a groan, whether recorded, videotaped, or spoken in person, with or without the speaker's knowledge, are acceptable inputs to voice stress analysis systems. The value of voice stress analysis technology' for military application could be extensive. During military field interrogations of potential informants, it could be applied in a manner similar to its application for law enforcement. Also, it's not known if stressed speech has any effects on the accuracy of speech technology, such as speaker identification and language identification. If voice stress can be detected, perhaps it can be taken into account in applying voice recognition technology and be used to improve these recognition capabilities. Therefore, this effort is to determine the scientific value and utility of existing, commercial voice stress analysis technology for law enforcement and military applications.

[1]  John H. L. Hansen,et al.  Speech under stress conditions: overview of the effect on speech production and on system performance , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[2]  Jean-Claude Junqua,et al.  The Lombard effect: a reflex to better communicate with others in noise , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[3]  David L. Thomson,et al.  Use of periodicity and jitter as speech recognition features , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[4]  John H. L. Hansen,et al.  Classification of speech under stress based on features derived from the nonlinear Teager energy operator , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[5]  John H. L. Hansen,et al.  Linear and nonlinear speech feature analysis for stress classification , 1998, ICSLP.

[6]  John H. L. Hansen,et al.  Getting started with SUSAS: a speech under simulated and actual stress database , 1997, EUROSPEECH.

[7]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[8]  Jean-Claude Junqua,et al.  The influence of acoustics on speech production: A noise-induced stress phenomenon known as the Lombard reflex , 1996, Speech Commun..

[9]  Chris Baber,et al.  Towards a definition and working model of stress and its effects on speech , 1996, Speech Commun..

[10]  John L. Arnott,et al.  Emotional stress in synthetic speech: Progress and future directions , 1996, Speech Commun..

[11]  Stanley Fisher,et al.  Speech during sustained operations , 1996, Speech Commun..

[12]  John H. L. Hansen,et al.  Classification of speech under stress using target driven features , 1996, Speech Commun..

[13]  Jan Noyes,et al.  Workload and the use of automatic speech recognition: The effects of time and resource demands , 1996, Speech Commun..

[14]  John H. L. Hansen,et al.  Analysis and compensation of speech under stress and noise for environmental robustness in speech recognition , 1996, Speech Commun..

[15]  Bernard Harmegnies,et al.  Time- and spectrum-related variabilities in stressed speech under laboratory and real conditions , 1996, Speech Commun..

[16]  Xue Wang,et al.  Analysis of context-dependent segmental duration for automatic speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[17]  John H. L. Hansen,et al.  Feature analysis and neural network-based classification of speech under stress , 1996, IEEE Trans. Speech Audio Process..

[18]  W. Sweet,et al.  The glass cockpit [flight deck automation] , 1995 .

[19]  Victor L. Cestaro A Comparison Between Decision Accuracy Rates Obtained Using the Polygraph Instrument and the Computer Voice Stress Analyzer (CVSA) in the Absence of Jeopardy. , 1995 .

[20]  John H. L. Hansen,et al.  ICARUS: Source generator based real-time recognition of speech in noisy stressful and Lombard effect environments , 1995, Speech Commun..

[21]  Yifan Gong,et al.  Speech recognition in noisy environments: A survey , 1995, Speech Commun..

[22]  Jonathan Ashmore Fundamentals of Hearing, 3rd edition. By William A. Yost. Pp. 326. Harcourt Brace, 1994. £31.00 hardback. ISBN 0 12 772690 X , 1994 .

[23]  John H. L. Hansen,et al.  Morphological constrained feature enhancement with adaptive cepstral compensation (MCE-ACC) for speech recognition in noise and Lombard effect , 1994, IEEE Trans. Speech Audio Process..

[24]  John H. L. Hansen,et al.  Nonlinear speech analysis using the teager energy operator with application to speech classification under stress , 1994, ICSLP.

[25]  Julia Hirschberg,et al.  Segmental effects on timing and height of pitch contours , 1994, ICSLP.

[26]  D Cairns,et al.  NONLINEAR ANALYSIS AND DETECTION OF SPEECH UNDER STRESSED CONDITIONS , 1994 .

[27]  Nick Campbell,et al.  The role of F0 and duration in signalling affect in Japanese: anger, kindness and Politeness , 1994, ICSLP.

[28]  Petros Maragos,et al.  Energy separation in signal modulations with application to speech analysis , 1993, IEEE Trans. Signal Process..

[29]  D B Pisoni,et al.  Effects of cognitive workload on speech production: acoustic analyses and perceptual consequences. , 1993, The Journal of the Acoustical Society of America.

[30]  John H. L. Hansen,et al.  Adaptive source generator compensation and enhancement for speech recognition in noisy stressful environments , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[31]  James F. Kaiser,et al.  Some useful properties of Teager's energy operators , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[32]  Petros Maragos,et al.  On amplitude and frequency demodulation using energy operators , 1993, IEEE Trans. Signal Process..

[33]  John H. L. Hansen,et al.  Discrete-Time Processing of Speech Signals , 1993 .

[34]  J C Junqua,et al.  The Lombard reflex and its role on human listeners and automatic speech recognizers. , 1993, The Journal of the Acoustical Society of America.

[35]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[36]  Raymond D. Kent,et al.  Intelligibility in speech disorders : theory, measurement, and management , 1992 .

[37]  Zinny S. Bond,et al.  A note on loud and lombard speech , 1990, ICSLP.

[38]  Brian Hanson,et al.  Robust speaker-independent word recognition using static, dynamic and acceleration features: experiments with Lombard and noisy speech , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[39]  J. F. Kaiser,et al.  On a simple algorithm to calculate the 'energy' of a signal , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[40]  John H. L. Hansen,et al.  Lombard effect compensation for robust automatic speech recognition in noise , 1990, ICSLP.

[41]  John H. L. Hansen,et al.  Evaluation of acoustic correlates of speech under stress for robust speech recognition , 1989, Proceedings of the Fifteenth Annual Northeast Bioengineering Conference.

[42]  B. J. Stanton,et al.  Robust recognition of loud and Lombard speech in the fighter cockpit environment , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[43]  Hiroshi Itoyama,et al.  Speech coding and speech synthesis system , 1988 .

[44]  B. J. Stanton,et al.  Acoustic-phonetic analysis of loud and Lombard speech in simulated cockpit conditions , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[45]  Yeunung Chen,et al.  Cepstral domain talker stress compensation for robust speech recognition , 1988, IEEE Trans. Acoust. Speech Signal Process..

[46]  John H. L. Hansen,et al.  Analysis and compensation of stressed and noisy speech with application to robust automatic recognition , 1988 .

[47]  John H. L. Hansen,et al.  Evaluation of speech under stress and emotional conditions , 1987 .

[48]  D. Folds Response organization and time-sharing in dual-task performance , 1987 .

[49]  Dennis J. Folds,et al.  Enhancement of Human Performance in Manual Target Acquisition and Tracking , 1987 .

[50]  E. A. Martin,et al.  Multi-style training for robust isolated-word speech recognition , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[51]  Richard P. Lippmann,et al.  Two-stage discriminant analysis for improved isolated-word recognition , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[52]  C. A. Simpson Speech variability effects on recognition accuracy associated with concurrent task performance by pilots , 1986 .

[53]  R. Lippmann,et al.  Multi‐style training for robust speech recognition under stress , 1986 .

[54]  George R. Doddington,et al.  Recognition of speech under stress and in noise , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[55]  Stephen E. Levinson,et al.  Continuously variable duration hidden Markov models for automatic speech recognition , 1986 .

[56]  David B. Pisoni,et al.  Some acoustic-phonetic correlates of speech produced in noise , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[57]  L. Streeter,et al.  Acoustic and perceptual indicators of emotional stress. , 1983, The Journal of the Acoustical Society of America.

[58]  Wolfgang Hess,et al.  Pitch Determination of Speech Signals , 1983 .

[59]  Gary K. Poock,et al.  Effect of task duration on voice recognition system performance , 1981 .

[60]  T Shipp,et al.  Current evidence for the existence of laryngeal macrotremor and microtremor. , 1981, Journal of forensic sciences.

[61]  H. Teager Some observations on oral air flow during phonation , 1980 .

[62]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[63]  D. H. Vandercar,et al.  A description and analysis of the operation and validity of the psychological stress evaluator. , 1980, Journal of forensic sciences.

[64]  J B Peckham A device for tracking the fundamental frequency of speech and its application in the assessment of strain in pilots and air traffic controllers , 1979 .

[65]  Henry R. Jex A Proposed Set of Standardized Sub-Critical Tasks for Tracking Workload Calibration , 1979 .

[66]  F Horvath An experimental comparison of the psychological stress evaluator and the galvanic skin response in detection of deception. , 1978, The Journal of applied psychology.

[67]  Harry Hollien,et al.  Speaker identification by long‐term spectra under normal and distorted speech conditions , 1977 .

[68]  N. Umeda Consonant duration in American English , 1977 .

[69]  Simonov Pv,et al.  Analysis of the human voice as a method of controlling emotional state: achievements and goals. , 1977 .

[70]  M V Frolov,et al.  Analysis of the human voice as a method of controlling emotional state: achievements and goals. , 1977, Aviation, space, and environmental medicine.

[71]  O Fujiwara,et al.  Method for determining pilot stress through analysis of voice communication. , 1976, Aviation, space, and environmental medicine.

[72]  D. Klatt Linguistic uses of segmental duration in English: acoustic and perceptual evidence. , 1976, The Journal of the Acoustical Society of America.

[73]  N. Umeda Vowel duration in American English. , 1975, The Journal of the Acoustical Society of America.

[74]  Harry Hollien,et al.  Perceptual identification of voices under normal, stress, and disguised speaking conditions , 1974 .

[75]  D. Klatt Letter: Interaction between two factors that influence vowel duration. , 1973, The Journal of the Acoustical Society of America.

[76]  K. Stevens,et al.  Emotions and speech: some acoustical correlates. , 1972, The Journal of the Acoustical Society of America.

[77]  T. P. Barnwell,et al.  An algorithm for segment durations in a reading machine context , 1971 .

[78]  M. Gardner Effect of Noise, System Gain, and Assigned Task on Talking Levels in Loudspeaker Communication , 1966 .

[79]  C N HANLEY,et al.  QUANTIFYING THE LOMBARD EFFECT. , 1965, The Journal of speech and hearing disorders.

[80]  Sheldon B. Michaels,et al.  Some Aspects of Fundamental Frequency and Envelope Amplitude as Related to the Emotional Content of Speech , 1962 .

[81]  C. Douglas Creelman,et al.  Human Discrimination of Auditory Duration , 1962 .

[82]  A. House On Vowel Duration in English , 1961 .

[83]  D. Fry Duration and Intensity as Physical Correlates of Linguistic Stress , 1954 .