Testing the Universal Baby Language Hypothesis - Automatic Infant Speech Recognition with CNNs

This paper presents an application of convolutional neural networks (CNN) for the recognition of the so-called “Dunstan baby language” that consists of five “words” or phonemes used by babies of age under 3 months to communicate their needs before they start crying. The model was derived from a CNN architecture which was successfully applied by the authors for voice-based emotion detection. The input of the neural network is the spectrogram obtained from the audio records of babies' voices and is processed as a two-dimensional image. The architecture was trained for a set of 250 small duration recordings and was tested for other 65 recordings with a recognition rate of 89%. The length of all audio files is less than 1 second; the recordings were extracted from certified Dunstan language recordings. The most important original contribution of the paper is the recognition of the actual “baby words” (and not the baby cry as was done before). This architecture offers an efficient tool for the verification of the “universal baby language” hypothesis, according to which the language of infants does not depend on culture, family, etc.

[1]  Chakib Tadj,et al.  A Cry-Based Babies Identification System , 2010, ICISP.

[2]  Dascalu,et al.  Voice Based Emotion Recognition with Convolutional Neural Networks for Companion Robots , 2018 .

[3]  Horia Cucu,et al.  Automatic methods for infant cry classification , 2016, 2016 International Conference on Communications (COMM).

[4]  Paul Boersma,et al.  Praat: doing phonetics by computer , 2003 .

[5]  Rabab K. Ward,et al.  Determining normal infants' level-of-distress from cry sounds , 1993, Proceedings of Canadian Conference on Electrical and Computer Engineering.

[6]  Yousra Abdulaziz,et al.  Infant cry recognition system: A comparison of system performance based on mel frequency and linear prediction cepstral coefficients , 2010, 2010 International Conference on Information Retrieval & Knowledge Management (CAMP).

[7]  David Klinghoffer,et al.  Baby talk. , 1995, First things.

[8]  V. M. Sardar An Automatic Infants Cry Detection Using Linear Frequency Cepstrum Coefficients(LFCC) , 2015 .

[9]  Sara Beth Lohre,et al.  Attune With Baby: An Innovative Attunement Program for Parents and Families With Integrated Evaluation , 2017 .

[10]  K. M. Ravi Kumar,et al.  Decoding Baby Talk: Basic Approach for Normal Classification of Infant Cry Signal , 2015 .

[11]  H. E. Baeck,et al.  A Bayesian classifier for baby's cry in pain and non-pain contexts , 2003, Proceedings of the 25th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (IEEE Cat. No.03CH37439).

[12]  Orion F. Reyes-Galaviz,et al.  A System for the Processing of Infant Cry to Recognize Pathologies in Recently Born Babies with Neural Networks , 2004 .

[13]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.