Parallels in the sequential organization of birdsong and human speech

Human speech possesses a rich hierarchical structure that allows for meaning to be altered by words spaced far apart in time. Conversely, the sequential structure of nonhuman communication is thought to follow non-hierarchical Markovian dynamics operating over only short distances. Here, we show that human speech and birdsong share a similar sequential structure indicative of both hierarchical and Markovian organization. We analyze the sequential dynamics of song from multiple songbird species and speech from multiple languages by modeling the information content of signals as a function of the sequential distance between vocal elements. Across short sequence-distances, an exponential decay dominates the information in speech and birdsong, consistent with underlying Markovian processes. At longer sequence-distances, the decay in information follows a power law, consistent with underlying hierarchical processes. Thus, the sequential organization of acoustic elements in two learned vocal communication signals (speech and birdsong) shows functionally equivalent dynamics, governed by similar processes. By examining the organization of bird song and human speech, the authors show that the two types of communication signals have similar sequential structures, following both hierarchical and Markovian organization.

[1]  Kazuo Okanoya,et al.  Birdsong neurolinguistics: songbird context-free grammar claim is premature , 2012, Neuroreport.

[2]  Joshua W. Shaevitz,et al.  Predictability and hierarchy in Drosophila behavior , 2016, Proceedings of the National Academy of Sciences.

[3]  Charles E. Taylor,et al.  Structure, syntax and “small-world” organization in the complex songs of California Thrashers (Toxostoma redivivum) , 2016 .

[4]  Sebastian Wallot,et al.  Multifractal analysis reveals music-like dynamic structure in songbird rhythms , 2018, Scientific Reports.

[5]  Morten H. Christiansen,et al.  How hierarchical is language use? , 2012, Proceedings of the Royal Society B: Biological Sciences.

[6]  Max Tegmark,et al.  Criticality in Formal Languages and Statistical Physics∗ , 2017 .

[7]  Antje Schweitzer,et al.  Convergence of articulation rate in spontaneous speech , 2013, INTERSPEECH.

[8]  Richard W. Hedley Composition and sequential organization of song repertoires in Cassin’s Vireo (Vireo cassinii) , 2015, Journal of Ornithology.

[9]  Leland McInnes,et al.  hdbscan: Hierarchical density based clustering , 2017, J. Open Source Softw..

[10]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction , 2018, ArXiv.

[11]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[12]  Navdeep Jaitly,et al.  Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13]  David A. Nicholson,et al.  Comparison of machine learning methods applied to birdsong element classification , 2016, SciPy.

[14]  Maryellen C. MacDonald,et al.  How language production shapes language form and comprehension , 2012, Front. Psychol..

[15]  Björn W. Schuller,et al.  Audio recognition in the wild: Static and dynamic classification on a real-world database of animal vocalizations , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16]  V. Menon,et al.  Musical rhythm spectra from Bach to Joplin obey a 1/f power law , 2012, Proceedings of the National Academy of Sciences.

[17]  Wentian Li Power Spectra of Regular Languages and Cellular Automata , 1987, Complex Syst..

[18]  Eduardo G. Altmann,et al.  On the origin of long-range correlations in texts , 2012, Proceedings of the National Academy of Sciences.

[19]  Noam Chomsky,et al.  The faculty of language: what is it, who has it, and how did it evolve? , 2002 .

[20]  K. Lashley The problem of serial order in behavior , 1951 .

[21]  Ryan P. Adams,et al.  Mapping Sub-Second Structure in Mouse Behavior , 2015, Neuron.

[22]  J. Mehler,et al.  Mora or syllable? Speech segmentation in Japanese , 1993 .

[23]  Timothy J. Gardner,et al.  Long-range Order in Canary Song , 2013, PLoS Comput. Biol..

[24]  R. Berwick,et al.  Songs to syntax: the linguistics of birdsong , 2011, Trends in Cognitive Sciences.

[25]  Masato Okada,et al.  Complex Sequencing Rules of Birdsong Can be Explained by Simple Hidden Markov Processes , 2010, PloS one.

[26]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[27]  Kazuo Okanoya,et al.  A simple explanation for the evolution of complex song syntax in Bengalese finches , 2013, Biology Letters.

[28]  D K Mellinger,et al.  Recognizing transient low-frequency whale sounds by spectrogram correlation. , 2000, The Journal of the Acoustical Society of America.

[29]  Martin Kay,et al.  Regular Models of Phonological Rule Systems , 1994, CL.

[30]  Noam Chomsky,et al.  Three models for the description of language , 1956, IRE Trans. Inf. Theory.

[31]  Charles E. Taylor,et al.  Structural Design Principles of Complex Bird Songs: A Network-Based Approach , 2012, PloS one.

[32]  Wentian Li Mutual information functions versus correlation functions , 1990 .

[33]  Richard W. Hedley,et al.  Complexity, Predictability and Time Homogeneity of Syntax in the Songs of Cassin’s Vireo (Vireo cassinii) , 2016, PloS one.

[34]  Henrike Hultsch,et al.  How songbirds deal with large amounts of serial information: retrieval rules suggest a hierarchical song memory , 1998, Biological Cybernetics.

[35]  Ofer Tchernichovski,et al.  Regularities in zebra finch song beyond the repeated motif , 2017, Behavioural Processes.

[36]  Dai Watanabe,et al.  Neural Coding of Syntactic Structure in Learned Vocalizations in the Songbird , 2011, The Journal of Neuroscience.

[37]  David R. Anderson,et al.  AIC model selection and multimodel inference in behavioral ecology: some background, observations, and comparisons , 2011, Behavioral Ecology and Sociobiology.

[38]  Max Tegmark,et al.  Critical Behavior in Physics and Probabilistic Formal Languages , 2016, Entropy.

[39]  M. Newville,et al.  Lmfit: Non-Linear Least-Square Minimization and Curve-Fitting for Python , 2014 .

[40]  Keiichi Tokuda,et al.  Speech parameter generation algorithms for HMM-based speech synthesis , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[41]  Andrew W. Senior,et al.  Long short-term memory recurrent neural network architectures for large scale acoustic modeling , 2014, INTERSPEECH.

[42]  C. Peng,et al.  Long-range correlations in nucleotide sequences , 1992, Nature.

[43]  M. Dawkins,et al.  Hierachical organization and postural facilitation: Rules for grooming in flies , 1976, Animal Behaviour.

[44]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[45]  Arik Kershenbaum,et al.  Animal vocal sequences: not the Markov chains we thought they were , 2014, Proceedings of the Royal Society B: Biological Sciences.

[46]  Wentian Li,et al.  Long-range correlation and partial 1/fα spectrum in a noncoding DNA sequence , 1992 .

[47]  Ezequiel M. Arneodo,et al.  A neural decoder for learned vocal behavior , 2017, bioRxiv.

[48]  Ryuji Suzuki,et al.  Information entropy of humpback whale songs. , 1999, The Journal of the Acoustical Society of America.

[49]  S. H. Hulse,et al.  Perceptual mechanisms for individual vocal recognition in European starlings,Sturnus vulgaris , 1998, Animal Behaviour.

[50]  K. Maekawa CORPUS OF SPONTANEOUS JAPANESE : ITS DESIGN AND EVALUATION , 2003 .

[51]  Charles E. Taylor,et al.  Bird-DB: A database for annotated bird song sequences , 2015, Ecol. Informatics.

[52]  Jeffrey Heinz,et al.  Sentence and Word Complexity , 2011, Science.

[53]  Yukiko Kikuchi,et al.  Structured sequence processing and combinatorial binding: neurobiologically and computationally informed hypotheses , 2019, Philosophical Transactions of the Royal Society B.

[54]  Dezhe Z. Jin,et al.  A Compact Statistical Model of the Song Syntax in Bengalese Finch , 2010, PLoS Comput. Biol..

[55]  L. Nathan Perkins,et al.  Hidden neural states underlie canary song syntax , 2019, Nature.

[56]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection , 2018, J. Open Source Softw..

[57]  David Reby,et al.  Cepstral coefficients and hidden Markov models reveal idiosyncratic voice characteristics in red deer (Cervus elaphus) stags. , 2006, The Journal of the Acoustical Society of America.

[58]  William J. Idsardi,et al.  What Complexity Differences Reveal About Domains in Language , 2013, Top. Cogn. Sci..

[59]  Werner Ebeling,et al.  Long-range correlations between letters and sentences in texts , 1995 .

[60]  Hervé Bourlard,et al.  Connectionist Speech Recognition: A Hybrid Approach , 1993 .

[61]  Stanislas Dehaene,et al.  Production of Supra-regular Spatial Sequences by Macaque Monkeys , 2018, Current Biology.