Latent space visualization, characterization, and generation of diverse vocal communication signals

Animals produce vocalizations that range in complexity from a single repeated call to hundreds of unique vocal elements patterned in sequences unfolding over hours. Characterizing complex vocalizations can require considerable effort and a deep intuition about each species’ vocal behavior. Even with a great deal of experience, human characterizations of animal communication can be affected by human perceptual biases. We present here a set of computational methods that center around projecting animal vocalizations into low dimensional latent representational spaces that are directly learned from data. We apply these methods to diverse datasets from over 20 species, including humans, bats, songbirds, mice, cetaceans, and nonhuman primates, enabling high-powered comparative analyses of unbiased acoustic features in the communicative repertoires across species. Latent projections uncover complex features of data in visually intuitive and quantifiable ways. We introduce methods for analyzing vocalizations as both discrete sequences and as continuous latent variables. Each method can be used to disentangle complex spectro-temporal structure and observe long-timescale organization in communication. Finally, we show how systematic sampling from latent representational spaces of vocalizations enables comprehensive investigations of perceptual and neural representations of complex and ecologically relevant acoustic feature spaces.

[1]  Y. Cohen,et al.  The what, where and how of auditory-object perception , 2013, Nature Reviews Neuroscience.

[2]  P. Marler,et al.  Categorical perception of a natural stimulus continuum: birdsong. , 1989, Science.

[3]  Michael D. Beecher,et al.  Signature Systems and Kin Recognition , 1982 .

[4]  T. Gentner Temporal scales of auditory objects underlying birdsong vocal recognition. , 2008, The Journal of the Acoustical Society of America.

[5]  Bruno B Averbeck,et al.  Distributed acoustic cues for caller identity in macaque vocalization , 2015, Royal Society Open Science.

[6]  M. Eens,et al.  TEMPORAL AND SEQUENTIAL ORGANISATION OF SONG BOUTS IN THE STARLING , 1988 .

[7]  Timo Aila,et al.  A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  David McCormick,et al.  The temporal organization of mouse ultrasonic vocalizations , 2018, PloS one.

[9]  Michael C. Hout,et al.  Multidimensional Scaling , 2003, Encyclopedic Dictionary of Archaeology.

[10]  C. Moss,et al.  Discrimination of infant isolation calls by female greater spear-nosed bats, Phyllostomus hastatus , 2007, Animal Behaviour.

[11]  G. F. Cooper,et al.  Development of the Brain depends on the Visual Environment , 1970, Nature.

[12]  Y. Cohen,et al.  Representation of speech categories in the primate auditory cortex. , 2011, Journal of neurophysiology.

[13]  Ramon Ferrer-i-Cancho,et al.  Acoustic sequences in non‐human animals: a tutorial review and prospectus , 2016, Biological reviews of the Cambridge Philosophical Society.

[14]  Ryuji Suzuki,et al.  Information entropy of humpback whale songs. , 1999, The Journal of the Acoustical Society of America.

[15]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction , 2018, ArXiv.

[16]  Tim Sainburg,et al.  Parallels in the sequential organization of birdsong and human speech , 2019, Nature Communications.

[17]  Charles E. Taylor,et al.  Bird-DB: A database for annotated bird song sequences , 2015, Ecol. Informatics.

[18]  H. Williams Birdsong and Singing Behavior , 2004, Annals of the New York Academy of Sciences.

[19]  Ryan P. Adams,et al.  Mapping Sub-Second Structure in Mouse Behavior , 2015, Neuron.

[20]  Jae S. Lim,et al.  Signal estimation from modified short-time Fourier transform , 1983, ICASSP.

[21]  Cory T. Miller,et al.  The communicative content of the common marmoset phee call during antiphonal calling , 2010, American journal of primatology.

[22]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[23]  Ole Winther,et al.  Autoencoding beyond pixels using a learned similarity metric , 2015, ICML.

[24]  Nisim Perets,et al.  High order structure in mouse courtship vocalizations , 2019 .

[25]  Tim Sainburg,et al.  Combining Biological and Artificial Approaches to Understand Perceptual Spaces for Categorizing Natural Acoustic Signals , 2018 .

[26]  Marcel Eens,et al.  Seasonal Changes in Courtship Song and the Medial Preoptic Area in Male European Starlings (Sturnus vulgaris) , 2000, Hormones and Behavior.

[27]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Juan Carlos Fernández,et al.  Multiobjective evolutionary algorithms to identify highly autocorrelated areas: the case of spatial distribution in financially compromised farms , 2014, Ann. Oper. Res..

[29]  Kazuo Okanoya,et al.  A simple explanation for the evolution of complex song syntax in Bengalese finches , 2013, Biology Letters.

[30]  S. H. Hulse,et al.  Perceptual mechanisms for individual vocal recognition in European starlings,Sturnus vulgaris , 1998, Animal Behaviour.

[31]  Shrikanth Narayanan,et al.  MUPET—Mouse Ultrasonic Profile ExTraction: A Signal Processing Tool for Rapid and Unsupervised Analysis of Ultrasonic Vocalizations , 2017, Neuron.

[32]  Kazuo Okanoya,et al.  Automatic Recognition of Element Classes and Boundaries in the Birdsong with Variable Sequences , 2016, PloS one.

[33]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[34]  Hiroki Koda,et al.  Superregular grammars do not provide additional explanatory power but allow for a compact analysis of animal song , 2018, Royal Society Open Science.

[35]  Leland McInnes,et al.  hdbscan: Hierarchical density based clustering , 2017, J. Open Source Softw..

[36]  L. Nathan Perkins,et al.  Hidden neural states underlie canary song syntax , 2019, Nature.

[37]  P. F. Jenkins,et al.  Complex organization of the warbling song in the European starling Sturnus vulgaris , 1988 .

[38]  F. McCoy,et al.  Janus-faced PIDD: a sensor for DNA damage-induced cell death or survival? , 2012, Molecular cell.

[39]  Frédéric E. Theunissen,et al.  The vocal repertoire of the domesticated zebra finch: a data-driven approach to decipher the information-bearing acoustic features of communication signals , 2016, Animal Cognition.

[40]  Richard Hans Robert Hahnloser,et al.  Neural Mechanisms of Vocal Sequence Generation in the Songbird , 2004, Annals of the New York Academy of Sciences.

[41]  P Iverson,et al.  Mapping the perceptual magnet effect for speech using signal detection theory and multidimensional scaling. , 1995, The Journal of the Acoustical Society of America.

[42]  Len Thomas,et al.  Passive acoustic monitoring of beaked whale densities in the Gulf of Mexico , 2015, Scientific Reports.

[43]  Tom White,et al.  Sampling Generative Networks: Notes on a Few Effective Techniques , 2016, ArXiv.

[44]  Zhiyi Chi,et al.  Temporal Precision and Temporal Drift in Brain and Behavior of Zebra Finch Song , 2001, Neuron.

[45]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[46]  Tim Sainburg,et al.  Five Ways in Which Computational Modeling Can Help Advance Cognitive Science: Lessons From Artificial Grammar Learning , 2019, Top. Cogn. Sci..

[47]  Zhiyong Xu,et al.  Automated bird acoustic event detection and robust species classification , 2017, Ecol. Informatics.

[48]  S. Peters,et al.  Neural Correlates of Categorical Perception in Learned Vocal Communication , 2009, Nature Neuroscience.

[49]  Tim Sainburg,et al.  Learned context dependent categorical perception in a songbird , 2018 .

[50]  Byron M. Yu,et al.  Dimensionality reduction for large-scale neural recordings , 2014, Nature Neuroscience.

[51]  Richard W. Hedley,et al.  Complexity, Predictability and Time Homogeneity of Syntax in the Songs of Cassin’s Vireo (Vireo cassinii) , 2016, PloS one.

[52]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[53]  R. Berwick,et al.  Songs to syntax: the linguistics of birdsong , 2011, Trends in Cognitive Sciences.

[54]  A. Doupe,et al.  Song-selective auditory circuits in the vocal control system of the zebra finch. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[55]  David Berthelot,et al.  Understanding and Improving Interpolation in Autoencoders via an Adversarial Regularizer , 2018, ICLR.

[56]  Andrew S. Liu,et al.  Causal contribution of primate auditory cortex to auditory perceptual decision-making , 2015, Nature Neuroscience.

[57]  Kevin R Coffey,et al.  DeepSqueak: a deep learning-based system for detection and analysis of ultrasonic vocalizations , 2019, Neuropsychopharmacology.

[58]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[59]  Lai Guan Ng,et al.  Dimensionality reduction for visualizing single-cell data using UMAP , 2018, Nature Biotechnology.

[60]  L. Nathan Perkins,et al.  A fast and accurate zebra finch syllable detector , 2017, PloS one.

[61]  Timothy J. Gardner,et al.  Long-range Order in Canary Song , 2013, PLoS Comput. Biol..

[62]  M. Knörnschild,et al.  The Vocal Repertoire of Adult and Neonate Giant Otters (Pteronura brasiliensis) , 2014, PloS one.

[63]  John M. Pearson,et al.  Inferring low-dimensional latent descriptions of animal vocalizations , 2019, bioRxiv.

[64]  M. Knörnschild,et al.  Isolation call ontogeny in bat pups (Glossophaga soricina) , 2017 .

[65]  Joshua W. Shaevitz,et al.  Predictability and hierarchy in Drosophila behavior , 2016, Proceedings of the National Academy of Sciences.

[66]  Charles E. Taylor,et al.  Structure, syntax and “small-world” organization in the complex songs of California Thrashers (Toxostoma redivivum) , 2016 .

[67]  Peter L. Tyack,et al.  Acoustic Communication Under the Sea , 1998 .

[68]  Yossi Yovel,et al.  An annotated dataset of Egyptian fruit bat vocalizations across varying contexts and during vocal ontogeny , 2017, Scientific Data.

[69]  Oliver Ratmann,et al.  Cultural conformity generates extremely stable traditions in bird song , 2018, Nature Communications.

[70]  VINCENT M. JANIK,et al.  Pitfalls in the categorization of behaviour: a comparison of dolphin whistle classification methods , 1999, Animal Behaviour.

[71]  B. Cooper,et al.  Fibre architecture and song activation rates of syringeal muscles are not lateralized in the European starling , 2010, Journal of Experimental Biology.

[72]  M. Orger,et al.  Structure of the Zebrafish Locomotor Repertoire Revealed with Unsupervised Behavioral Clustering , 2018, Current Biology.

[73]  R. Lachlan,et al.  Context-dependent categorical perception in a songbird , 2015, Proceedings of the National Academy of Sciences.

[74]  Richard Hans Robert Hahnloser,et al.  An ultra-sparse code underliesthe generation of neural sequences in a songbird , 2002, Nature.

[75]  Eliot A. Brenowitz,et al.  Seasonal Changes in Testosterone, Neural Attributes of Song Control Nuclei, and Song Structure in Wild Songbirds , 1997, The Journal of Neuroscience.

[76]  Thomas Brox,et al.  Learning to generate chairs with convolutional neural networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[77]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[78]  Christopher W. Clark,et al.  MobySound: A reference archive for studying automatic recognition of marine mammal sounds , 2006 .

[79]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[80]  Dale Stokes,et al.  The social vocalization repertoire of east Australian migrating humpback whales (Megaptera novaeangliae). , 2007, The Journal of the Acoustical Society of America.

[81]  Marie A. Roch,et al.  Automated classification of dolphin echolocation click types from the Gulf of Mexico , 2017, PLoS Comput. Biol..

[82]  Frédéric E Theunissen,et al.  Zebra finches identify individuals using vocal signatures unique to each call type , 2018, Nature Communications.

[83]  Jeremy F. Magland,et al.  A Fully Automated Approach to Spike Sorting , 2017, Neuron.

[84]  Samy Bengio,et al.  Tacotron: Towards End-to-End Speech Synthesis , 2017, INTERSPEECH.

[85]  Fuzhen Zhuang,et al.  Embedding with Autoencoder Regularization , 2013, ECML/PKDD.

[86]  Sarah M. N. Woolley,et al.  Modulation Power and Phase Spectrum of Natural Sounds Enhance Neural Encoding Performed by Single Auditory Neurons , 2004, The Journal of Neuroscience.

[87]  Prafulla Dhariwal,et al.  Glow: Generative Flow with Invertible 1x1 Convolutions , 2018, NeurIPS.

[88]  Benjamin L. de Bivort,et al.  Ethology as a physical science , 2018, Nature Physics.

[89]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[90]  Timothy Q. Gentner,et al.  Associative Learning Enhances Population Coding by Inverting Interneuronal Correlation Patterns , 2013, Neuron.

[91]  C. E. Ho,et al.  A procedure for an automated measurement of song similarity , 2000, Animal Behaviour.

[92]  Sepp Kollmorgen,et al.  Neighborhood-statistics reveal complex dynamics of song acquisition in the zebra finch , 2019, bioRxiv.

[93]  Frederic E. Theunissen,et al.  Neurophysiological response selectivity for conspecific songs over synthetic sounds in the auditory forebrain of non-singing female songbirds , 2007, Journal of Comparative Physiology A.

[94]  S. Phinn,et al.  Australian vegetated coastal ecosystems as global hotspots for climate change mitigation , 2019, Nature Communications.