Engineering Innovation in Speech Science: Data and Technologies

As increasing amounts and types of speech data become accessible, health care and technology industries increasingly demand quantitative insight into speech content. The potential for speech data to provide insight into cognitive, affective, and psychological health states and behavior crucially depends on the ability to integrate speech data into the scientific process. Current engineering methods for acquiring, analyzing, and modeling speech data present the opportunity to integrate speech data into the scientific process. Additionally, machine learning systems recognize patterns in data that can facilitate hypothesis generation, data analysis, and statistical modeling. The goals of the present article are (a) to review developments across these domains that have allowed real-time magnetic resonance imaging to shed light on aspects of atypical speech articulation; (b) in a parallel vein, to discuss how advancements in signal processing have allowed for an improved understanding of communication markers associated with autism spectrum disorder; and (c) to highlight the clinical significance and implications of the application of these technological advancements to each of these areas. The collaboration of engineers, speech scientists, and clinicians has resulted in (a) the development of biologically inspired technology that has been proven useful for both small- and large-scale analyses, (b) a deepened practical and theoretical understanding of both typical and impaired speech production, and (c) the establishment and enhancement of diagnostic and therapeutic tools, all having far-reaching, interdisciplinary significance. https://doi.org/10.23641/asha.7740191

[1]  Kathleen Hubbard,et al.  Intonation and Emotion in Autistic Spectrum Disorders , 2007, Journal of psycholinguistic research.

[2]  Craig H Meyer,et al.  Assessment of velopharyngeal function with dual‐planar high‐resolution real‐time spiral dynamic MRI , 2018, Magnetic resonance in medicine.

[3]  Shrikanth Narayanan,et al.  A fast and flexible MRI system for the study of dynamic vocal tract shaping , 2017, Magnetic resonance in medicine.

[4]  John H. Esling,et al.  Illustrating the Production of the International Phonetic Alphabet Sounds Using Fast Real-Time Magnetic Resonance Imaging , 2016, INTERSPEECH.

[5]  Zhi-Pei Liang,et al.  A real-time MRI investigation of the role of lingual and pharyngeal articulation in the production of the nasal vowel system of French , 2015, J. Phonetics.

[6]  Robert E. Hillman,et al.  Mobile Voice Health Monitoring Using a Wearable Accelerometer Sensor and a Smartphone Platform , 2012, IEEE Transactions on Biomedical Engineering.

[7]  Takeo Kanade,et al.  Dense 3D face alignment from 2D videos in real-time , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[8]  Shrikanth S. Narayanan,et al.  Using real time magnetic resonance imaging to measure changes in articulatory behavior due to partial glossectomy , 2017 .

[9]  Shrikanth S. Narayanan,et al.  Developmental acoustic study of American English diphthongs. , 2014, The Journal of the Acoustical Society of America.

[10]  Matthew S. Goodwin,et al.  Applying Machine Learning to Facilitate Autism Diagnostics: Pitfalls and Promises , 2015, Journal of autism and developmental disorders.

[11]  Shrikanth S. Narayanan,et al.  Articulation of English vowels in running speech: A real-time MRI study , 2015, ICPhS.

[12]  Shrikanth Narayanan,et al.  Acoustic Denoising Using Dictionary Learning With Spectral and Temporal Regularization , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[13]  Shrikanth S. Narayanan,et al.  The psychologist as an interlocutor in autism spectrum disorder assessment: insights from a study of spontaneous prosody. , 2014, Journal of speech, language, and hearing research : JSLHR.

[14]  Daniel H. Geschwind,et al.  Genetics of autism spectrum disorders , 2011, Trends in Cognitive Sciences.

[15]  Shrikanth Narayanan,et al.  An approach to real-time magnetic resonance imaging for speech production. , 2003, The Journal of the Acoustical Society of America.

[16]  Athanasios Katsamanis,et al.  Rapid semi-automatic segmentation of real-time magnetic resonance images for parametric vocal tract analysis , 2010, INTERSPEECH.

[17]  Jens Frahm,et al.  Real-time magnetic resonance imaging of temporomandibular joint dynamics. , 2011 .

[18]  M. P. Gelfer Perceptual attributes of voice: Development and use of rating scales , 1988 .

[19]  Yoon-Chul Kim,et al.  Seeing speech: Capturing vocal tract shaping using real-time magnetic resonance imaging [Exploratory DSP] , 2008, IEEE Signal Processing Magazine.

[20]  Shinobu Masaki,et al.  Measurement of temporal changes in vocal tract area function from 3D cine-MRI data. , 2006, The Journal of the Acoustical Society of America.

[21]  Anibal Gutierrez,et al.  Objective measurement of head movement differences in children with and without autism spectrum disorder , 2018, Molecular Autism.

[22]  Shrikanth S. Narayanan,et al.  Systematic variation in the articulation of the Korean liquid across prosodic positions , 2015, International Congress of Phonetic Sciences.

[23]  Markus Windolf,et al.  Systematic accuracy and precision analysis of video motion capturing systems--exemplified on the Vicon-460 system. , 2008, Journal of biomechanics.

[24]  Shrikanth S. Narayanan,et al.  An articulatory study of fricative consonants using magnetic resonance imaging , 1995 .

[25]  Tobias Baer,et al.  When to Use Machine Learning , 2019, Understand, Manage, and Prevent Algorithmic Bias.

[26]  J. Hillenbrand,et al.  Acoustic correlates of breathy vocal quality. , 1994, Journal of speech and hearing research.

[27]  Shrikanth S. Narayanan,et al.  Region Segmentation in the Frequency Domain Applied to Upper Airway Real-Time Magnetic Resonance Images , 2009, IEEE Transactions on Medical Imaging.

[28]  Sue Peppé,et al.  Prosody in autism spectrum disorders: a critical review. , 2003, International journal of language & communication disorders.

[29]  F. Volkmar,et al.  Visual fixation patterns during viewing of naturalistic social situations as predictors of social competence in individuals with autism. , 2002, Archives of general psychiatry.

[30]  Rahul Gupta,et al.  Analysis of engagement behavior in children during dyadic interactions using prosodic cues , 2016, Comput. Speech Lang..

[31]  Shrikanth Narayanan,et al.  Synchronized and noise-robust audio recordings during realtime magnetic resonance imaging scans. , 2006, The Journal of the Acoustical Society of America.

[32]  Shrikanth S. Narayanan,et al.  Timing effects of syllable structure and stress on nasals: A real-time MRI examination , 2009, J. Phonetics.

[33]  S. Benzer Genes and behavior. , 1968, Science.

[34]  Tanaya Guha,et al.  On quantifying facial expression-related atypicality of children with Autism Spectrum Disorder , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[35]  Katarzyna Chawarska,et al.  Speech Disturbs Face Scanning in 6-Month-Old Infants Who Develop Autism Spectrum Disorder , 2014, Biological Psychiatry.

[36]  W S Levine,et al.  Modeling the motion of the internal tongue from tagged cine-MRI images. , 2001, The Journal of the Acoustical Society of America.

[37]  A. Klin,et al.  Two-year-olds with autism orient to nonsocial contingencies rather than biological motion , 2009, Nature.

[38]  Shrikanth Narayanan,et al.  Characterizing Post-Glossectomy Speech Using Real-time MRI , 2013 .

[39]  Björn Schuller,et al.  Opensmile: the munich versatile and fast open-source audio feature extractor , 2010, ACM Multimedia.

[40]  C. A. Bouman Region Segmentation , 2009, Encyclopedia of Database Systems.

[41]  Shrikanth S. Narayanan,et al.  An investigation of vocal arousal dynamics in child-psychologist interactions using synchrony measures and a conversation-based model , 2014, INTERSPEECH.

[42]  P. Bolton,et al.  Heritability of autism spectrum disorders: a meta‐analysis of twin studies , 2015, Journal of child psychology and psychiatry, and allied disciplines.

[43]  Stefan Kopp,et al.  Gesture and speech in interaction: An overview , 2014, Speech Commun..

[44]  Tara McAllister Byun,et al.  Enhancing Intervention for Residual Rhotic Errors Via App-Delivered Biofeedback: A Case Study. , 2017, Journal of speech, language, and hearing research : JSLHR.

[45]  E. Walker,et al.  Diagnostic and Statistical Manual of Mental Disorders , 2013 .

[46]  Jamie L. Perry,et al.  Velopharyngeal Structural and Functional Assessment of Speech in Young Children Using Dynamic Magnetic Resonance Imaging , 2017, The Cleft palate-craniofacial journal : official publication of the American Cleft Palate-Craniofacial Association.

[47]  Shrikanth S. Narayanan,et al.  Enhanced airway-tissue boundary segmentation for real-time magnetic resonance imaging data , 2014 .

[48]  Adso Fernández-Baena,et al.  Biomechanical Validation of Upper-Body and Lower-Body Joint Movements of Kinect Motion Capture Data for Rehabilitation Treatments , 2012, 2012 Fourth International Conference on Intelligent Networking and Collaborative Systems.

[49]  Shrikanth S. Narayanan,et al.  Characterizing Articulation in Apraxic Speech Using Real-Time Magnetic Resonance Imaging. , 2017, Journal of speech, language, and hearing research : JSLHR.

[50]  Krishna S. Nayak,et al.  The future of real-time cardiac magnetic resonance imaging , 2005, Current cardiology reports.

[51]  E. Hoffman,et al.  Vocal tract area functions from magnetic resonance imaging. , 1996, The Journal of the Acoustical Society of America.

[52]  Angeliki Metallinou,et al.  Quantifying atypicality in affective facial expressions of children with autism spectrum disorders , 2013, 2013 IEEE International Conference on Multimedia and Expo (ICME).

[53]  Tanaya Guha,et al.  A Computational Study of Expressive Facial Dynamics in Children with Autism , 2018, IEEE Transactions on Affective Computing.

[54]  A. Klin,et al.  Absence of preferential looking to the eyes of approaching adults predicts level of social disability in 2-year-old toddlers with autism spectrum disorder. , 2008, Archives of general psychiatry.

[55]  Shrikanth S. Narayanan,et al.  Gestural Control in the English Past-Tense Suffix: An Articulatory Study Using Real-Time MRI , 2015, Phonetica.

[56]  Shrikanth S. Narayanan,et al.  Acoustic-prosodic correlates of 'awkward' prosody in story retellings from adolescents with autism , 2015, INTERSPEECH.

[57]  Shrikanth S. Narayanan,et al.  Accelerated three‐dimensional upper airway MRI using compressed sensing , 2009, Magnetic resonance in medicine.

[58]  Benjamin Halberstam Acoustic and Perceptual Parameters Relating to Connected Speech Are More Reliable Measures of Hoarseness than Parameters Relating to Sustained Vowels , 2004, ORL.

[59]  Matthew S. Goodwin,et al.  Use of machine learning to improve autism screening and diagnostic instruments: effectiveness, efficiency, and multi-instrument fusion. , 2016, Journal of child psychology and psychiatry, and allied disciplines.

[60]  J. Kreiman,et al.  Perceptual evaluation of voice quality: review, tutorial, and a framework for future research. , 1993, Journal of speech and hearing research.

[61]  J. Sundberg,et al.  Acoustic measurements and perceptual evaluation of hoarseness in children's voices , 1998 .

[62]  Daniel S. Messinger,et al.  Head Movement Dynamics during Play and Perturbed Mother-Infant Interaction , 2015, IEEE Transactions on Affective Computing.

[63]  Panayiotis G. Georgiou,et al.  Behavioral Signal Processing: Deriving Human Behavioral Informatics From Speech and Language , 2013, Proceedings of the IEEE.

[64]  Shrikanth S. Narayanan,et al.  Acoustic-Prosodic and Physiological Response to Stressful Interactions in Children with Autism Spectrum Disorder , 2017, INTERSPEECH.

[65]  Shrikanth Narayanan,et al.  Bilabial Substitution Patterns During Consonant Production in a Case of Congenital Aglossia , 2017 .

[66]  Shrikanth Narayanan,et al.  Real-time magnetic resonance imaging and electromagnetic articulography database for speech production research (TC). , 2014, The Journal of the Acoustical Society of America.

[67]  Maurizio Gentilucci,et al.  On gesture and speech , 2015 .

[68]  Shrikanth Narayanan,et al.  Signal Processing and Machine Learning for Mental Health Research and Clinical Applications [Perspectives] , 2017, IEEE Signal Processing Magazine.