Modeling the Effect of Military Oxygen Masks on Speech Characteristics

Wearing an oxygen mask changes the speech production of speakers. It indeed modifies the vocal apparatus and perturbs the articulatory movements of the speaker. This paper studies the impact of the oxygen mask of military aircraft pilots on formant trajectories, both dynamically (variations of the formants at a utterance level) and globally (mean value at the utterance level) for 12 speakers. A comparative analysis of speech collected with and without an oxygen mask shows that the mask has a significant impact on the formant trajectories, both on the mean values and on the formant variations at the utterance level. This impact is strongly dependent on the speaker and also on the mask model. These observations suggest that the articulatory movements of the speaker are modified by the presence of the mask. These observations are validated via a preliminary ASR experiment that uses a data augmentation technique based on articulatory perturbations that are driven by our experimental observations.

[1]  Paul Boersma,et al.  Praat: doing phonetics by computer , 2003 .

[2]  Sanjeev Khudanpur,et al.  A time delay neural network architecture for efficient modeling of long temporal contexts , 2015, INTERSPEECH.

[3]  Navdeep Jaitly,et al.  Vocal Tract Length Perturbation (VTLP) improves speech recognition , 2013 .

[4]  M. Vojnovic,et al.  Transfer Characteristics of Vocal Tract Closed by Mask Cavity , 2018 .

[5]  Tuomo K Leino,et al.  Effect of cognitive load on articulation rate and formant frequencies during simulator flights. , 2011, The Journal of the Acoustical Society of America.

[6]  Jose M. Ramirez,et al.  A Survey of the Effects of Data Augmentation for Automatic Speech Recognition Systems , 2019, CIARP.

[7]  Milan Vojnović,et al.  Influence of Overpressure Breathing on Vowel Formant Frequencies , 2021, Archives of Acoustics.

[8]  Tapio Seppänen,et al.  Prosodic Features of Speech Produced by Military Pilots during Demanding Tasks , 2005 .

[9]  Lori Lamel,et al.  Challenges in Audio Processing of Terrorist-Related Data , 2018, MMM.

[10]  Allan J. South A model of vowel production under positive pressure breathing , 2001, INTERSPEECH.

[11]  J C Junqua,et al.  The Lombard reflex and its role on human listeners and automatic speech recognizers. , 1993, The Journal of the Acoustical Society of America.

[12]  Bart de Boer,et al.  Introducing Parselmouth: A Python interface to Praat , 2018, J. Phonetics.

[13]  Z S Bond,et al.  Acoustic-phonetic characteristics of speech produced in noise and while wearing an oxygen mask. , 1989, The Journal of the Acoustical Society of America.

[14]  Dhananjaya N. Gowda,et al.  Improved Vocal Tract Length Perturbation for a State-of-the-Art End-to-End Speech Recognition System , 2019, INTERSPEECH.

[15]  Timothy R. Anderson,et al.  The Effects of High Sustained Acceleration on the Acoustic Phonetic Structure of Speech. A Preliminary Investigation. , 1986 .

[16]  Jean-Luc Gauvain,et al.  Conversational telephone speech recognition , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[17]  Milan Vojnović,et al.  The influence of the oxygen mask on long-time spectra of continuous speech , 1997 .

[18]  Jean-Luc Gauvain,et al.  Investigating techniques for low resource conversational speech recognition , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  Allan J. South Some characteristics of speech produced under high G-force and pressure breathing , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[20]  D. Ladd,et al.  PITCH RANGE MODELLING: LINGUISTIC DIMENSIONS OF VARIATION , 1999 .