Automatic assessment of language background in toddlers through phonotactic and pitch pattern modeling of short vocalizations

This study utilizes phonotactic and pitch pattern modeling for automatic assessment of toddlers’ language background from short vocalization segments. The experiments are conducted on audio recordings of twelve 25–31 months old USborn and Shanghainese toddlers. Each recording captures a whole-day sound track of an ordinary day in the toddlers’ life spent in their natural environment. In a preliminary study, we observed that in spite of the limited presence of linguistic content in the early age child vocalizations, certain phonotactic and prosodic patterns were correlated with the child’s language background. In the current effort, we analyze to what extent these language-salient cues can be leveraged in the context of automatic language background classification. Besides a traditional parallel phone recognition with statistical language modeling (PPRLM) and phone recognition with support vector machines (PRSVM), a novel scheme that utilizes pitch patterns (PPSVM) is proposed. The classification results on very short vocalizations (on average less than 3 seconds long) confirm that both phonotactic and prosodic features capture a languagespecific content, reaching equal error rates (EER) of 32.45 % for PRSVM, 31.33 % for PPSVM, and 29.97 % in a fusion of PRSVM and PPSVM systems. The competitive performance of PPSVM suggests that pitch contours carry a significant portion of the language-specific information in toddlers’ vocalizations.

[1]  Li-Rong Dai,et al.  The Adaptation Schemes In PR-SVM Based Language Recognition , 2008, 2008 6th International Symposium on Chinese Spoken Language Processing.

[2]  Ronald A. Cole,et al.  Highly accurate children's speech recognition for interactive reading tutors using subword units , 2007, Speech Commun..

[3]  Dongxin Xu,et al.  Objective Child Vocal Development Measurement with Naturalistic Daylong Audio Recording , 2012, INTERSPEECH.

[4]  Andreas Stolcke,et al.  Improving Language Recognition with Multilingual Phone Recognition and Speaker Adaptation Transforms , 2010, Odyssey.

[5]  Yonghong Yan,et al.  The Design of Backend Classifiers in PPRLM System for Language Identification , 2007, Third International Conference on Natural Computation (ICNC 2007).

[6]  P. Kuhl,et al.  Phonetic learning as a pathway to language: new data and native language magnet theory expanded (NLM-e) , 2008, Philosophical Transactions of the Royal Society B: Biological Sciences.

[7]  Marc A. Zissman,et al.  Comparison of : Four Approaches to Automatic Language Identification of Telephone Speech , 2004 .

[8]  John H. L. Hansen,et al.  Signal processing for young child speech language development , 2008, WOCCI.

[9]  Shrikanth S. Narayanan,et al.  Detecting Politeness and frustration state of a child in a conversational computer game , 2005, INTERSPEECH.

[10]  John H. L. Hansen,et al.  A preliminary study of child vocalization on a parallel corpus of US and shanghainese toddlers , 2013, INTERSPEECH.

[11]  John H. L. Hansen,et al.  Supervector pre-processing for PRSVM-based Chinese and Arabic dialect identification , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[12]  Raymond D. Kent,et al.  Acoustic features of infant vocalic utterances at 3, 6, and 9 months. , 1982, The Journal of the Acoustical Society of America.

[13]  William M. Campbell,et al.  Language Recognition with Word Lattices and Support Vector Machines , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[14]  Douglas E. Sturim,et al.  The MITLL NIST LRE 2015 Language Recognition System , 2016, Odyssey.

[15]  John H. L. Hansen,et al.  Arabic Dialect Identification - 'Is the Secret in the Silence?' and Other Observations , 2012, INTERSPEECH.

[16]  William M. Campbell,et al.  Phonetic Speaker Recognition with Support Vector Machines , 2003, NIPS.

[17]  Douglas E. Sturim,et al.  The MITLL NIST LRE 2009 language recognition system , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[18]  Abeer Alwan,et al.  Pronunciation verification of children²s speech for automatic literacy assessment , 2006, INTERSPEECH.

[19]  Alvin F. Martin,et al.  The DET curve in assessment of detection task performance , 1997, EUROSPEECH.

[20]  Douglas A. Reynolds,et al.  Language Recognition via i-vectors and Dimensionality Reduction , 2011, INTERSPEECH.

[21]  Jonas Beskow,et al.  Wavesurfer - an open source speech tool , 2000, INTERSPEECH.

[22]  L. Gerken Prosody's role in language acquisition and adult parsing , 1996, Journal of psycholinguistic research.

[23]  Dongxin Xu,et al.  Child vocalization composition as discriminant information for automatic autism detection , 2009, 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[24]  William M. Campbell,et al.  Experiments with Lattice-based PPRLM Language Identification , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[25]  Shari R. Speer,et al.  Prosody in First Language Acquisition - Acquiring Intonation as a Tool to Organize Information in Conversation , 2009, Lang. Linguistics Compass.