Detecting Nasal Vowels in Speech Interfaces Based on Surface Electromyography

Nasality is a very important characteristic of several languages, European Portuguese being one of them. This paper addresses the challenge of nasality detection in surface electromyography (EMG) based speech interfaces. We explore the existence of useful information about the velum movement and also assess if muscles deeper down in the face and neck region can be measured using surface electrodes, and the best electrode location to do so. The procedure we adopted uses Real-Time Magnetic Resonance Imaging (RT-MRI), collected from a set of speakers, providing a method to interpret EMG data. By ensuring compatible data recording conditions, and proper time alignment between the EMG and the RT-MRI data, we are able to accurately estimate the time when the velum moves and the type of movement when a nasal vowel occurs. The combination of these two sources revealed interesting and distinct characteristics in the EMG signal when a nasal vowel is uttered, which motivated a classification experiment. Overall results of this experiment provide evidence that it is possible to detect velum movement using sensors positioned below the ear, between mastoid process and the mandible, in the upper neck region. In a frame-based classification scenario, error rates as low as 32.5% for all speakers and 23.4% for the best speaker have been achieved, for nasal vowel detection. This outcome stands as an encouraging result, fostering the grounds for deeper exploration of the proposed approach as a promising route to the development of an EMG-based speech interface for languages with strong nasal characteristics.

[1]  R.N. Scott,et al.  A new strategy for multifunction myoelectric control , 1993, IEEE Transactions on Biomedical Engineering.

[2]  Rodrigo Quian Quiroga,et al.  Nonlinear multivariate analysis of neurophysiological signals , 2005, Progress in Neurobiology.

[3]  Salwani Abdullah,et al.  Great Deluge Algorithm for Rough Set Attribute Reduction , 2010, FGIT-DTA/BSBT.

[4]  Milton M. Azevedo,et al.  Readings in Portuguese linguistics , 1978 .

[5]  Shinobu Masaki,et al.  Difference in vocal tract shape between upright and supine postures: Observations by an open-type MRI scanner , 2005 .

[6]  António J. S. Teixeira,et al.  Towards a Silent Speech Interface for Portuguese - Surface Electromyography and the Nasality Challenge , 2012, BIOSIGNALS.

[7]  Junichi Yamagishi,et al.  Towards Cross-Lingual Emotion Transplantation , 2014, IberSPEECH.

[8]  Y. Yao,et al.  Information-Theoretic Measures for Knowledge Discovery and Data Mining , 2003 .

[9]  R. Clive Willis Review — Comptes-rendusAnálise de sons nasais e sons nasalizados do Português: Armando de Lacerda and Brian F. Head, Laboratório de Fonética Experimental da Faculdade de Letras da Universidade de Coimbra, 1963, 71 pp , 1967 .

[10]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[11]  Fernando Batista,et al.  Advances in Speech and Language Technologies for Iberian Languages , 2016, Lecture Notes in Computer Science.

[12]  Olov Engwall Assessing MRI measurements : Effects of sustenation, gravitation and coarticulation , 2006 .

[13]  Eike Kiltz,et al.  Tightly-Secure Signatures from Chameleon Hash Functions , 2015, Public Key Cryptography.

[14]  António J. S. Teixeira,et al.  Real-Time MRI for Portuguese - Database, Methods and Applications , 2012, PROPOR.

[15]  J B Moon,et al.  Effects of gravity on velopharyngeal muscle activity during speech. , 1995, The Cleft palate-craniofacial journal : official publication of the American Cleft Palate-Craniofacial Association.

[16]  Tanja Schultz,et al.  Spatial Artifact Detection for Multi-channel EMG-based Speech Recognition , 2014, BIOSIGNALS.

[17]  Lubker Jf,et al.  An electromyographic-cinefluorographic investigation of velar function during normal speech production. , 1968 .

[18]  D P Kuehn,et al.  Relationships between muscle activity and velar position. , 1982, The Cleft palate journal.

[19]  James F. Curtis,et al.  Electromyographic‐Cinéfluorographic Investigation of Velar Function during Speech Production , 1966 .

[20]  William J. Hardcastle,et al.  Physiology of speech production: An introduction for speech scientists , 1976 .

[21]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[22]  D. Winter,et al.  EMG profiles during normal human walking: stride-to-stride and inter-subject variability. , 1987, Electroencephalography and clinical neurophysiology.

[23]  Jawaharlal Karmeshu,et al.  Entropy Measures, Maximum Entropy Principle and Emerging Applications , 2003 .

[24]  M Stone,et al.  Comparison of speech production in upright and supine position. , 2007, The Journal of the Acoustical Society of America.

[25]  António J. S. Teixeira,et al.  Automatic Speech Recognition Based on Ultrasonic Doppler Sensing for European Portuguese , 2012, IberSPEECH.

[26]  F Bell-Berti,et al.  An electromyographic study of velopharyngeal function in speech. , 1976, Journal of speech and hearing research.

[27]  Anthony J. Seikel,et al.  Anatomy and Physiology for Speech, Language, and Hearing , 1999 .

[28]  António J. S. Teixeira,et al.  Segmentation and Analysis of the Oral and Nasal Cavities from MR Time Sequences , 2012, ICIAR.

[29]  Tanja Schultz,et al.  Session-independent EMG-based Speech Recognition , 2011, BIOSIGNALS.

[30]  B. Fritzell,et al.  The velopharyngeal muscles in speech. An electromyographic and cinéradiographic study. , 1969, Acta oto-laryngologica.

[31]  Patrice Speeter Beddor,et al.  THE PERCEPTION OF NASAL VOWELS , 1993 .

[32]  Alex Acero,et al.  Spoken Language Processing , 2001 .

[33]  Alan A. Wrench,et al.  Proceedings of 17th ICPhS, Hong Kong , 2011 .

[34]  J. M. Gilbert,et al.  Silent speech interfaces , 2010, Speech Commun..

[35]  S. McGill,et al.  Appropriately placed surface EMG electrodes reflect deep muscle activity (psoas, quadratus lumborum, abdominal wall) in the lumbar spine. , 1996, Journal of biomechanics.

[36]  Francisco A. C. Vaz,et al.  Síntese articulatória das vogais nasais do português europeu , 2001 .

[37]  Edward Jones,et al.  Combined speech enhancement and auditory modelling for robust distributed speech recognition , 2008, Speech Commun..

[38]  Jamie L Perry,et al.  Variations in Velopharyngeal Structures between Upright and Supine Positions Using Upright Magnetic Resonance Imaging , 2011, The Cleft palate-craniofacial journal : official publication of the American Cleft Palate-Craniofacial Association.

[39]  Brian Everitt,et al.  A Handbook of Statistical Analyses using SAS SECOND EDITION , 2007 .

[40]  Marlien Herselman,et al.  Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) , 2015 .

[41]  J. Harrington,et al.  Speech Production: Models, Phonetic Processes, and Techniques , 2006 .

[42]  Thomas M. Cover,et al.  Elements of Information Theory: Cover/Elements of Information Theory, Second Edition , 2005 .

[43]  John H. L. Hansen,et al.  Trends in Speech and Language Processing [In the Spotlight] , 2012, IEEE Signal Process. Mag..

[44]  Joseph Hilbe,et al.  A Handbook of Statistical Analyses Using R , 2006 .

[45]  Alan Wrench,et al.  An Ultrasound Protocol for Comparing Tongue Contours: Upright vs Supine , 2011, ICPhS.

[46]  D P Kuehn,et al.  An electromyographic study of the musculus uvulae. , 1988, The Cleft palate journal.

[47]  L. Maier-Hein,et al.  Session independent non-audible speech recognition using surface electromyography , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[48]  J F Lubker,et al.  An electromyographic-cinefluorographic investigation of velar function during normal speech production. , 1968, The Cleft palate journal.

[49]  Rodney B K Sampson Nasal vowel evolution in Romance , 1999 .