Finding, visualizing, and quantifying latent structure across diverse animal vocal repertoires

Animals produce vocalizations that range in complexity from a single repeated call to hundreds of unique vocal elements patterned in sequences unfolding over hours. Characterizing complex vocalizations can require considerable effort and a deep intuition about each species’ vocal behavior. Even with a great deal of experience, human characterizations of animal communication can be affected by human perceptual biases. We present a set of computational methods for projecting animal vocalizations into low dimensional latent representational spaces that are directly learned from the spectrograms of vocal signals. We apply these methods to diverse datasets from over 20 species, including humans, bats, songbirds, mice, cetaceans, and nonhuman primates. Latent projections uncover complex features of data in visually intuitive and quantifiable ways, enabling high-powered comparative analyses of vocal acoustics. We introduce methods for analyzing vocalizations as both discrete sequences and as continuous latent variables. Each method can be used to disentangle complex spectro-temporal structure and observe long-timescale organization in communication.

[1]  Kazuo Okanoya,et al.  Automatic Recognition of Element Classes and Boundaries in the Birdsong with Variable Sequences , 2016, PloS one.

[2]  Max Tegmark,et al.  Criticality in Formal Languages and Statistical Physics∗ , 2017 .

[3]  Solveig C. Mouterde,et al.  Acoustic Communication and Sound Degradation: How Do the Individual Signatures of Male and Female Zebra Finch Calls Transmit over Distance? , 2014, PloS one.

[4]  S. Peters,et al.  Neural Correlates of Categorical Perception in Learned Vocal Communication , 2009, Nature Neuroscience.

[5]  Cory T. Miller,et al.  The communicative content of the common marmoset phee call during antiphonal calling , 2010, American journal of primatology.

[6]  M. Knörnschild,et al.  The Vocal Repertoire of Adult and Neonate Giant Otters (Pteronura brasiliensis) , 2014, PloS one.

[7]  John M. Pearson,et al.  Inferring low-dimensional latent descriptions of animal vocalizations , 2019, bioRxiv.

[8]  M. Knörnschild,et al.  Isolation call ontogeny in bat pups (Glossophaga soricina) , 2017 .

[9]  Hiroki Koda,et al.  Superregular grammars do not provide additional explanatory power but allow for a compact analysis of animal song , 2018, Royal Society Open Science.

[10]  Martin Wattenberg,et al.  How to Use t-SNE Effectively , 2016 .

[11]  M. Magnasco,et al.  Simple motor gestures for birdsongs. , 2001, Physical review letters.

[12]  Jordi Bonada,et al.  Bird Song Synthesis Based on Hidden Markov Models , 2016, INTERSPEECH.

[13]  Leland McInnes,et al.  hdbscan: Hierarchical density based clustering , 2017, J. Open Source Softw..

[14]  L. Nathan Perkins,et al.  A fast and accurate zebra finch syllable detector , 2017, PloS one.

[15]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[16]  Timothy J. Gardner,et al.  Long-range Order in Canary Song , 2013, PLoS Comput. Biol..

[17]  Hyunghoon Cho,et al.  Density-Preserving Data Visualization Unveils Dynamic Patterns of Single-Cell Transcriptomic Variability , 2020, bioRxiv.

[18]  Tim Sainburg,et al.  Learned context dependent categorical perception in a songbird , 2018 .

[19]  Michael D. Beecher,et al.  Signature Systems and Kin Recognition , 1982 .

[20]  L. Nathan Perkins,et al.  Hidden neural states underlie canary song syntax. , 2020, Nature.

[21]  Joshua W. Shaevitz,et al.  Predictability and hierarchy in Drosophila behavior , 2016, Proceedings of the National Academy of Sciences.

[22]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[23]  Charles E. Taylor,et al.  Structure, syntax and “small-world” organization in the complex songs of California Thrashers (Toxostoma redivivum) , 2016 .

[24]  P. Marler,et al.  Quantitative Analysis of Animal Vocal Phonology: an Application to Swamp Sparrow Song , 2010 .

[25]  Peter L. Tyack,et al.  Acoustic Communication Under the Sea , 1998 .

[26]  Yossi Yovel,et al.  An annotated dataset of Egyptian fruit bat vocalizations across varying contexts and during vocal ontogeny , 2017, Scientific Data.

[27]  Oliver Ratmann,et al.  Cultural conformity generates extremely stable traditions in bird song , 2018, Nature Communications.

[28]  Aniruddh D. Patel,et al.  Songbirds use spectral shape, not pitch, for sound pattern recognition , 2016, Proceedings of the National Academy of Sciences.

[29]  Christopher W. Clark,et al.  MobySound: A reference archive for studying automatic recognition of marine mammal sounds , 2006 .

[30]  Tim Sainburg,et al.  Combining Biological and Artificial Approaches to Understand Perceptual Spaces for Categorizing Natural Acoustic Signals , 2018 .

[31]  Yonatan Sanz Perl,et al.  Reconstruction of physiological instructions from Zebra finch song. , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[32]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Joshua B. Tenenbaum,et al.  Global Versus Local Methods in Nonlinear Dimensionality Reduction , 2002, NIPS.

[34]  Ezequiel M. Arneodo,et al.  Prosthetic Avian Vocal Organ Controlled by a Freely Behaving Bird Based on a Low Dimensional Model of the Biomechanical Periphery , 2012, PLoS Comput. Biol..

[35]  Richard Hans Robert Hahnloser,et al.  Neural Mechanisms of Vocal Sequence Generation in the Songbird , 2004, Annals of the New York Academy of Sciences.

[36]  Kevin R Coffey,et al.  DeepSqueak: a deep learning-based system for detection and analysis of ultrasonic vocalizations , 2019, Neuropsychopharmacology.

[37]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[38]  Lai Guan Ng,et al.  Dimensionality reduction for visualizing single-cell data using UMAP , 2018, Nature Biotechnology.

[39]  K. Weinberg,et al.  Gene correction for SCID-X1 in long-term hematopoietic stem cells , 2018, Nature Communications.

[40]  L. Nathan Perkins,et al.  Hidden neural states underlie canary song syntax , 2019, Nature.

[41]  P. Marler,et al.  Categorical perception of a natural stimulus continuum: birdsong. , 1989, Science.

[42]  Carel ten Cate,et al.  The Progressive Loss of Syntactical Structure in Bird Song along an Island Colonization Chain , 2013, Current Biology.

[43]  C. E. Ho,et al.  A procedure for an automated measurement of song similarity , 2000, Animal Behaviour.

[44]  Sepp Kollmorgen,et al.  Neighborhood-statistics reveal complex dynamics of song acquisition in the zebra finch , 2019, bioRxiv.

[45]  Dale Stokes,et al.  The social vocalization repertoire of east Australian migrating humpback whales (Megaptera novaeangliae). , 2007, The Journal of the Acoustical Society of America.

[46]  Marie A. Roch,et al.  Automated classification of dolphin echolocation click types from the Gulf of Mexico , 2017, PLoS Comput. Biol..

[47]  Masato Okada,et al.  Complex Sequencing Rules of Birdsong Can be Explained by Simple Hidden Markov Processes , 2010, PloS one.

[48]  P. F. Jenkins,et al.  Complex organization of the warbling song in the European starling Sturnus vulgaris , 1988 .

[49]  Panu Somervuo,et al.  Time–frequency warping of spectrograms applied to bird sound analyses , 2019 .

[50]  Eliot A. Brenowitz,et al.  Seasonal Changes in Testosterone, Neural Attributes of Song Control Nuclei, and Song Structure in Wild Songbirds , 1997, The Journal of Neuroscience.

[51]  Paolo Giudici,et al.  Likelihood‐Ratio Tests for Hidden Markov Models , 2000, Biometrics.

[52]  Bruno B Averbeck,et al.  Distributed acoustic cues for caller identity in macaque vocalization , 2015, Royal Society Open Science.

[53]  C. Moss,et al.  Discrimination of infant isolation calls by female greater spear-nosed bats, Phyllostomus hastatus , 2007, Animal Behaviour.

[54]  J A Kogan,et al.  Automated recognition of bird song elements from continuous recordings using dynamic time warping and hidden Markov models: a comparative study. , 1998, The Journal of the Acoustical Society of America.

[55]  S. H. Hulse,et al.  Perceptual mechanisms for individual vocal recognition in European starlings,Sturnus vulgaris , 1998, Animal Behaviour.

[56]  Frédéric E Theunissen,et al.  Zebra finches identify individuals using vocal signatures unique to each call type , 2018, Nature Communications.

[57]  Len Thomas,et al.  Passive acoustic monitoring of beaked whale densities in the Gulf of Mexico , 2015, Scientific Reports.

[58]  P. Marler,et al.  Species-universal microstructure in the learned song of the swamp sparrow (Melospiza georgiana) , 1984, Animal Behaviour.

[59]  Benjamin L. de Bivort,et al.  Ethology as a physical science , 2018, Nature Physics.

[60]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[61]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction , 2018, ArXiv.

[62]  Julia Hirschberg,et al.  V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure , 2007, EMNLP.

[63]  Andrew Farnsworth,et al.  A comparison of similarity-based approaches in the classification of flight calls of four species of North American wood-warblers (Parulidae) , 2014, Ecol. Informatics.

[64]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[65]  Zhiyong Xu,et al.  Automated bird acoustic event detection and robust species classification , 2017, Ecol. Informatics.

[66]  David McCormick,et al.  The temporal organization of mouse ultrasonic vocalizations , 2018, PloS one.

[67]  Shrikanth Narayanan,et al.  MUPET—Mouse Ultrasonic Profile ExTraction: A Signal Processing Tool for Rapid and Unsupervised Analysis of Ultrasonic Vocalizations , 2017, Neuron.

[68]  G. F. Cooper,et al.  Development of the Brain depends on the Visual Environment , 1970, Nature.

[69]  Tim Sainburg,et al.  Generative adversarial interpolative autoencoding: adversarial training on latent space interpolations encourage convex latent distributions , 2018, ArXiv.

[70]  Ryuji Suzuki,et al.  Information entropy of humpback whale songs. , 1999, The Journal of the Acoustical Society of America.

[71]  Jacob Schreiber,et al.  Pomegranate: fast and flexible probabilistic modeling in python , 2017, J. Mach. Learn. Res..

[72]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection , 2018, J. Open Source Softw..

[73]  VINCENT M. JANIK,et al.  Pitfalls in the categorization of behaviour: a comparison of dolphin whistle classification methods , 1999, Animal Behaviour.

[74]  B. Cooper,et al.  Fibre architecture and song activation rates of syringeal muscles are not lateralized in the European starling , 2010, Journal of Experimental Biology.

[75]  J. G. Skellam,et al.  A New Method for determining the Type of Distribution of Plant Individuals , 1954 .

[76]  M. Orger,et al.  Structure of the Zebrafish Locomotor Repertoire Revealed with Unsupervised Behavioral Clustering , 2018, Current Biology.

[77]  R. Lachlan,et al.  Context-dependent categorical perception in a songbird , 2015, Proceedings of the National Academy of Sciences.

[78]  Richard Hans Robert Hahnloser,et al.  An ultra-sparse code underliesthe generation of neural sequences in a songbird , 2002, Nature.

[79]  Max Tegmark,et al.  Critical Behavior in Physics and Probabilistic Formal Languages , 2016, Entropy.

[80]  Dan Stowell,et al.  Automatic large-scale classification of bird sounds is strongly improved by unsupervised feature learning , 2014, PeerJ.

[81]  Frédéric E. Theunissen,et al.  The vocal repertoire of the domesticated zebra finch: a data-driven approach to decipher the information-bearing acoustic features of communication signals , 2016, Animal Cognition.

[82]  P Iverson,et al.  Mapping the perceptual magnet effect for speech using signal detection theory and multidimensional scaling. , 1995, The Journal of the Acoustical Society of America.

[83]  James L. Fuller The vocal repertoire of adult male blue monkeys (Cercopithecus mitis stulmanni): A quantitative analysis of acoustic structure , 2014, American journal of primatology.

[84]  C Daniel Meliza,et al.  Pitch- and spectral-based dynamic time warping methods for comparing field recordings of harmonic avian vocalizations. , 2013, The Journal of the Acoustical Society of America.

[85]  Charles E. Taylor,et al.  Bird-DB: A database for annotated bird song sequences , 2015, Ecol. Informatics.

[86]  H. Williams Birdsong and Singing Behavior , 2004, Annals of the New York Academy of Sciences.

[87]  Marcel Eens,et al.  Seasonal Changes in Courtship Song and the Medial Preoptic Area in Male European Starlings (Sturnus vulgaris) , 2000, Hormones and Behavior.

[88]  Ricardo J. G. B. Campello,et al.  Density-Based Clustering Based on Hierarchical Density Estimates , 2013, PAKDD.

[89]  Kazuo Okanoya,et al.  A simple explanation for the evolution of complex song syntax in Bengalese finches , 2013, Biology Letters.

[90]  Ieee Xplore,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence Information for Authors , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[91]  Zhiyi Chi,et al.  Temporal Precision and Temporal Drift in Brain and Behavior of Zebra Finch Song , 2001, Neuron.

[92]  Richard W. Hedley,et al.  Complexity, Predictability and Time Homogeneity of Syntax in the Songs of Cassin’s Vireo (Vireo cassinii) , 2016, PloS one.

[93]  Nisim Perets,et al.  High order structure in mouse courtship vocalizations , 2019 .

[94]  Erin M. Bayne,et al.  Pre-processing spectrogram parameters improve the accuracy of bioacoustic classification using convolutional neural networks , 2020, Bioacoustics.

[95]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[96]  Ryan P. Adams,et al.  Mapping Sub-Second Structure in Mouse Behavior , 2015, Neuron.

[97]  Ramon Ferrer-i-Cancho,et al.  Acoustic sequences in non‐human animals: a tutorial review and prospectus , 2016, Biological reviews of the Cambridge Philosophical Society.

[98]  Tim Sainburg,et al.  Parallels in the sequential organization of birdsong and human speech , 2019, Nature Communications.

[99]  Byron M. Yu,et al.  Dimensionality reduction for large-scale neural recordings , 2014, Nature Neuroscience.

[100]  R. Berwick,et al.  Songs to syntax: the linguistics of birdsong , 2011, Trends in Cognitive Sciences.