Lexical Tone Recognition with an Artificial Neural Network

Objectives: Tone production is particularly important for communicating in tone languages such as Mandarin Chinese. In the present study, an artificial neural network was used to recognize tones produced by adult native speakers. The purposes of the study were (1) to test the sensitivity of the neural network to speaker variation typically in adult speaker groups, (2) to evaluate two normalization procedures to overcome the effects of speaker variation, and (3) to compare tone recognition performance of the neural network with that of the human listeners. Design: A feedforward multilayer neural network was used. Twenty-nine adult native Mandarin Chinese speakers were recruited to record tone samples. The F0 contours of the vowel part of the 1044 monosyllabic words recorded were extracted using an autocorrelation method. Samples from the F0 contours were used as inputs to the neural network. The efficacy of the neural network was first tested by varying the number of inputs and the number of neurons in the hidden layer from 1 to 16. The sensitivity of the neural network to speaker variation was tested by (1) using the raw F0 data from speech tokens of a number of randomly drawn speakers that varied from 1 to 29, (2) using the raw F0 data from speech tokens of either male-only or female-only speakers, and (3) using two sets of normalized F0 data (i.e., tone 1-based normalization and first-order derivative) from speech tokens from a number of randomly drawn speakers that varied from 1 to 29. The recognition performance of the neural network under several experimental conditions was compared with the corresponding recognition performance of 10 normal-hearing, native Mandarin Chinese speaking adult listeners. Results: Three inputs and four hidden neurons were found to be sufficient for the neural network to perform at about 85% correct using speech samples without normalization. The performance of the neural network was affected by variation across speakers particularly between genders. Using the tone 1-based normalization procedure, the performance of the neural network improved significantly. The recognition accuracy of the neural network as a whole or for each tone was comparable with that of the human listeners. Conclusions: The neural network can be used to evaluate the tone production of Mandarin Chinese speaking adults with human listener-like recognition accuracy. The tone 1-based normalization procedure improves the performance of the neural network to human listener-like accuracy. The success of our neural network in recognizing tones from multiple speakers supports its utility for evaluating tone production. Further testing of the neural network with hearing-impaired speakers might reveal its potential use for clinical evaluation of tone production.

[1]  Corinne B. Moore,et al.  Speaker normalization in the perception of Mandarin Chinese tones. , 1997, The Journal of the Acoustical Society of America.

[2]  Pak-Chung Ching,et al.  Tone recognition of isolated Cantonese syllables , 1995, IEEE Trans. Speech Audio Process..

[3]  Alexander L. Francis,et al.  The perception of Cantonese lexical tones by early-deafened cochlear implantees. , 2002, The Journal of the Acoustical Society of America.

[4]  Demin Han,et al.  Tone production in Mandarin-speaking children with cochlear implants: a preliminary study , 2004, Acta oto-laryngologica.

[5]  S Y Liu,et al.  Nucleus 22-channel cochlear mini-system implantations in Mandarin-speaking patients. , 1996, The American journal of otology.

[6]  M. Slowiaczek,et al.  Prosodic structure in language understanding: evidence from tone sandhi in Mandarin. , 1989, Language and speech.

[7]  Yi Xu,et al.  Learning phonetic categories by tracking movements , 2007, Cognition.

[8]  Y Xu,et al.  Production and perception of coarticulated tones. , 1994, The Journal of the Acoustical Society of America.

[9]  Y Hui,et al.  Chinese tonal language rehabilitation following cochlear implantation in children. , 2000, Acta oto-laryngologica.

[10]  Ning Zhou,et al.  Tone production of Mandarin Chinese speaking children with cochlear implants. , 2007, International journal of pediatric otorhinolaryngology.

[11]  F. Zeng,et al.  Identification of temporal envelope cues in Chinese tone recognition , 2000 .

[12]  Fan-Gang Zeng,et al.  Music Perception with Temporal Cues in Acoustic and Electric Hearing , 2004, Ear and hearing.

[13]  Jan Wouters,et al.  Better place-coding of the fundamental frequency in cochlear implants. , 2004, The Journal of the Acoustical Society of America.

[14]  Yi Xu,et al.  Information for Mandarin tones in the amplitude contour and in brief segments , 1990 .

[15]  Hintat Cheung,et al.  Perception and Production of Mandarin Tones in Prelingually Deaf Children with Cochlear Implants , 2004, Ear and hearing.

[16]  Ning Zhou,et al.  Development and evaluation of methods for assessing tone production skills in Mandarin-speaking children with cochlear implants. , 2008, The Journal of the Acoustical Society of America.

[17]  Bryan E Pfingst,et al.  Relative importance of temporal envelope and fine structure in lexical-tone perception. , 2003, The Journal of the Acoustical Society of America.

[18]  C A van Hasselt,et al.  Cantonese tone perception ability of cochlear implant children in comparison with normal-hearing children. , 2002, International journal of pediatric otorhinolaryngology.

[19]  Stuart Rosen,et al.  Spectral and temporal cues to pitch in noise-excited vocoder simulations of continuous-interleaved-sampling cochlear implants. , 2002, The Journal of the Acoustical Society of America.

[20]  Raymond D. Kent,et al.  Acoustic Analysis of Speech , 2009 .

[21]  Aichen T. Ho The Acoustic Variation of Mandarin Tones , 1976 .

[22]  Y R Wang,et al.  Tone recognition of continuous Mandarin speech assisted with prosodic information. , 1994, The Journal of the Acoustical Society of America.

[23]  L Geurts,et al.  Coding of the fundamental frequency in continuous interleaved sampling processors for cochlear implants. , 2001, The Journal of the Acoustical Society of America.

[24]  J. Leather,et al.  Speaker normalization in perception of lexical tone , 1983 .

[25]  Patrick C M Wong,et al.  Perceptual normalization for inter- and intratalker variation in Cantonese level tones. , 2003, Journal of speech, language, and hearing research : JSLHR.

[26]  Bryan E Pfingst,et al.  Features of stimulation affecting tonal-speech perception: implications for cochlear prostheses. , 2002, The Journal of the Acoustical Society of America.

[27]  A. Samuel,et al.  Perception of Mandarin Lexical Tones when F0 Information is Neutralized , 2004, Language and speech.

[28]  J. Jenkins,et al.  Perception and production of lexical tones by 3-year-old, Mandarin-speaking children. , 2005, Journal of speech, language, and hearing research : JSLHR.

[29]  Sin-Horng Chen,et al.  Mandarin tone recognition by multi-layer perceptron , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[30]  Ning Zhou,et al.  Recognition of lexical tone production of children with an artificial neural network , 2007, Acta oto-laryngologica.

[31]  Brian C J Moore,et al.  Coding of Sounds in the Auditory System and Its Relevance to Signal Processing and Coding in Cochlear Implants , 2003, Otology & neurotology : official publication of the American Otological Society, American Neurotology Society [and] European Academy of Otology and Neurotology.

[32]  Valter Ciocca,et al.  Acoustic and Perceptual Study of Cantonese Tones Produced by Profoundly Hearing-Impaired Adolescents , 2006, Ear and hearing.

[33]  D H Whalen,et al.  Information for Mandarin Tones in the Amplitude Contour and in Brief Segments , 1990, Phonetica.

[34]  R. Diehl,et al.  Effects of syllable duration on the perception of the Mandarin Tone 2/Tone 3 distinction: evidence of auditory enhancement , 1990 .

[35]  Stuart Rosen,et al.  Enhancing temporal cues to voice pitch in continuous interleaved sampling cochlear implants. , 2004, The Journal of the Acoustical Society of America.

[36]  Coarticulation • Suprasegmentals,et al.  Acoustic Phonetics , 2019, The SAGE Encyclopedia of Human Communication Sciences and Disorders.

[37]  G. E. Peterson Parameters of vowel quality. , 1961, Journal of speech and hearing research.

[38]  J. Pickett,et al.  The Acoustics of Speech Communication: Fundamentals, Speech Perception Theory, and Technology , 1998 .

[39]  D. Broadbent,et al.  Information Conveyed by Vowels , 1957 .

[40]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[41]  Sin-Horng Chen,et al.  Tone recognition of continuous Mandarin speech based on neural networks , 1995, IEEE Trans. Speech Audio Process..

[42]  Yi Xu Contextual tonal variations in Mandarin , 1997 .

[43]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[44]  M. Skinner,et al.  Optimization of Speech Processor Fitting Strategies for Chinese‐Speaking Cochlear Implantees , 1998, The Laryngoscope.

[45]  Sin-Horng Chen,et al.  Tone Recognition of Continuous Mandarin Speech Based on Hidden Markov Model , 1994, Int. J. Pattern Recognit. Artif. Intell..