Cetacean Translation Initiative: a roadmap to deciphering the communication of sperm whales

The past decade has witnessed a groundbreaking rise of machine learning for human language analysis, with current methods capable of automatically accurately recovering various aspects of syntax and semantics — including sentence structure and grounded word meaning — from large data collections. Recent research showed the promise of such tools for analyzing acoustic communication in nonhuman species. We posit that machine learning will be the cornerstone of future collection, processing, and analysis of multimodal streams of data in animal communication studies, including bioacoustic, behavioral, biological, and environmental data. Cetaceans are unique non-human model species as they possess sophisticated acoustic communications, but utilize a very different encoding system that evolved in an aquatic rather than terrestrial medium. Sperm whales, in particular, with their highly-developed neuroanatomical features, cognitive abilities, social structures, and discrete click-based encoding make for an excellent starting point for advanced machine learning tools that can be applied to other animals in the future. This paper details a roadmap toward this goal based on currently existing technology and multidisciplinary scientific community effort. We outline the key elements required for the collection and processing of massive bioacoustic data of sperm whales, detecting their basic communication units and languagelike higher-level structures, and validating these models through interactive playback experiments. The technological capabilities developed by such an undertaking are likely to yield cross-applications and advancements in broader communities investigating non-human communication and animal behavioral research.

[1]  Mathieu Lihoreau,et al.  Animal social networks: Towards an integrative framework embedding social interactions, space and time , 2021, Methods in Ecology and Evolution.

[2]  G. Beguš CiwGAN and fiwGAN: Encoding information in acoustic data to model lexical learning with Generative Adversarial Networks , 2020, Neural Networks.

[3]  Peter K. McGregor,et al.  22. Dynamic Acoustic Communication and Interactive Playback , 2020 .

[4]  Bruce R. Rosen,et al.  The Mind of a Mouse , 2020, Cell.

[5]  P. Madsen,et al.  The long-range echo scene of the sperm whale biosonar , 2020, Biology Letters.

[6]  L. Rendell,et al.  Coda repertoire and vocal clans of sperm whales in the western Atlantic Ocean , 2020 .

[7]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[8]  H. Kamper,et al.  Vector-quantized neural networks for acoustic unit discovery in the ZeroSpeech 2020 challenge , 2020, INTERSPEECH.

[9]  L. Huijser,et al.  Vocal repertoires and insights into social structure of sperm whales ( Physeter macrocephalus ) in Mauritius, southwestern Indian Ocean , 2020 .

[10]  Daniel M. Vogt,et al.  Ultra-gentle soft robotic fingers induce minimal transcriptomic response in a fragile marine animal , 2020, Current Biology.

[11]  Robert J. Wood,et al.  Deep Machine Learning Techniques for the Detection and Classification of Sperm Whale Bioacoustics , 2019, Scientific Reports.

[12]  Austin M. Garner,et al.  Bioinspired remora adhesive disc offers insight into evolution , 2019, Bioinspiration & biomimetics.

[13]  Stefan Lee,et al.  ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks , 2019, NeurIPS.

[14]  H. Whitehead,et al.  Sperm Whale: The Largest Toothed Creature on Earth , 2019, Ethology and Behavioral Ecology of Odontocetes.

[15]  Alexander M. Rush,et al.  Compound Probabilistic Context-Free Grammars for Grammar Induction , 2019, ACL.

[16]  Ewald van der Westhuizen,et al.  Unsupervised acoustic unit discovery for speech synthesis using discrete latent-variable neural networks , 2019, INTERSPEECH.

[17]  Ron J. Weiss,et al.  Unsupervised Speech Representation Learning Using WaveNet Autoencoders , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[18]  P. Sah,et al.  A multi-species repository of social networks , 2018, Scientific Data.

[19]  Jianfeng Gao,et al.  Neural Approaches to Conversational AI , 2018, ACL.

[20]  Toshitaka N Suzuki,et al.  From bird calls to human language: exploring the evolutionary drivers of compositional syntax , 2018, Current Opinion in Behavioral Sciences.

[21]  P. Madsen,et al.  The evolution of foraging capacity and gigantism in cetaceans , 2018, Journal of Experimental Biology.

[22]  James R. Glass,et al.  Speech2Vec: A Sequence-to-Sequence Framework for Learning Word Embeddings from Speech , 2018, INTERSPEECH.

[23]  Daniela Rus,et al.  Exploration of underwater life with an acoustically controlled soft robotic fish , 2018, Science Robotics.

[24]  Li Wen,et al.  A biorobotic adhesive disc for underwater hitchhiking inspired by the remora suckerfish , 2017, Science Robotics.

[25]  Katherine A. Cronin,et al.  Studying primate cognition in a social setting to improve validity and welfare: a literature review highlighting successful approaches , 2017, PeerJ.

[26]  Peter L. Tyack,et al.  Experimental field studies to measure behavioral responses of cetaceans to sonar , 2016 .

[27]  H. Whitehead Consensus movements by groups of sperm whales , 2016 .

[28]  Peter L. Tyack,et al.  Disturbance-specific social responses in long-finned pilot whales, Globicephala melas , 2016, Scientific Reports.

[29]  Stefanie E. LaZerte,et al.  Learning to cope: vocal adjustment to urban noise is correlated with prior experience in black-capped chickadees , 2016, Proceedings of the Royal Society B: Biological Sciences.

[30]  Angela S. Stoeger,et al.  Information content and acoustic structure of male African elephant social rumbles , 2016, Scientific Reports.

[31]  P. Madsen,et al.  Socially segregated, sympatric sperm whale clans in the Atlantic Ocean , 2016, Royal Society Open Science.

[32]  P. Madsen,et al.  Sperm whale codas may encode individuality as well as clan identity. , 2016, The Journal of the Acoustical Society of America.

[33]  Lin-Shan Lee,et al.  Audio Word2Vec: Unsupervised Learning of Audio Segment Representations Using Sequence-to-Sequence Autoencoder , 2016, INTERSPEECH.

[34]  H. Whitehead,et al.  Individual, unit and vocal clan level identity cues in sperm whale codas , 2016, Royal Society Open Science.

[35]  Trevor Darrell,et al.  Grounding of Textual Phrases in Images by Reconstruction , 2015, ECCV.

[36]  Dan Klein,et al.  Neural Module Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[38]  H. Whitehead,et al.  How does social behavior differ among sperm whale clans , 2015 .

[39]  Damien R. Farine,et al.  Constructing, conducting and interpreting animal social network analysis , 2015, The Journal of animal ecology.

[40]  S. King You talkin’ to me? Interactive playback is a powerful yet underused tool in animal communication research , 2015, Biology Letters.

[41]  H. Whitehead,et al.  Individualized social preferences and long-term social fidelity between social units of sperm whales , 2015, Animal Behaviour.

[42]  Simon King,et al.  Unsupervised lexical clustering of speech segments using fixed-dimensional acoustic embeddings , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[43]  W. Tecumseh Fitch,et al.  Chorusing, synchrony, and the evolutionary functions of rhythm , 2014, Front. Psychol..

[44]  Vincent M Janik,et al.  Cetacean vocal learning and communication , 2014, Current Opinion in Neurobiology.

[45]  J. George,et al.  The transcriptome of the bowhead whale Balaena mysticetus reveals adaptations of the longest-lived mammal , 2014, Aging.

[46]  Quoc V. Le,et al.  Grounded Compositional Semantics for Finding and Describing Images with Sentences , 2014, TACL.

[47]  Graeme Shannon,et al.  Elephants can determine ethnicity, gender, and age from acoustic cues in human voices , 2014, Proceedings of the National Academy of Sciences.

[48]  Differences in sperm whale codas between two waters off Japan: possible geographic separation of vocal clans , 2014 .

[49]  Daniela Rus,et al.  Autonomous Soft Robotic Fish Capable of Escape Maneuvers Using Fluidic Elastomer Actuators. , 2014, Soft robotics.

[50]  Jason N. Bruck Decades-long social memory in bottlenose dolphins , 2013, Proceedings of the Royal Society B: Biological Sciences.

[51]  Hal Whitehead,et al.  Calves as social hubs: dynamics of the social network within sperm whale units , 2013, Proceedings of the Royal Society B: Biological Sciences.

[52]  Vincent M Janik,et al.  Bottlenose dolphins can use learned vocal labels to address each other , 2013, Proceedings of the National Academy of Sciences.

[53]  W. Oliveira,et al.  Comparing compression models for authorship attribution. , 2013, Forensic science international.

[54]  Laela S. Sayigh,et al.  Communication in bottlenose dolphins: 50 years of signature whistle research , 2013, Journal of Comparative Physiology A.

[55]  V. Janik,et al.  Vocal copying of individually distinctive signature whistles in bottlenose dolphins , 2013, Proceedings of the Royal Society B: Biological Sciences.

[56]  E. Hebets,et al.  The dominance of seismic signaling and selection for signal complexity in Schizocosa multimodal courtship displays , 2013, Behavioral Ecology and Sociobiology.

[57]  JULIA FISCHER,et al.  Bioacoustic Field Research: A Primer to Acoustic Analyses and Playback Experiments With Primates , 2013, American journal of primatology.

[58]  Micha Elsner,et al.  Bootstrapping a Unified Model of Lexical and Phonetic Acquisition , 2012, ACL.

[59]  Vincent M Janik,et al.  Bottlenose dolphins exchange signature whistles when meeting at sea , 2012, Proceedings of the Royal Society B: Biological Sciences.

[60]  Sharon Goldwater,et al.  Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , 2012 .

[61]  James R. Glass Towards unsupervised speech processing , 2012, 2012 11th International Conference on Information Science, Signal Processing and their Applications (ISSPA).

[62]  W. Maddison,et al.  Orchestrating the score: complex multimodal courtship in the Habronattus coecatus group of Habronattus jumping spiders (Araneae: Salticidae) , 2012 .

[63]  H. Whitehead,et al.  Can Genetic Differences Explain Vocal Dialect Variation in Sperm Whales, Physetermacrocephalus? , 2012, Behavior genetics.

[64]  Matthew R. Walter,et al.  Understanding Natural Language Commands for Robotic Navigation and Mobile Manipulation , 2011, AAAI.

[65]  Tyler M. Schulz,et al.  Individually distinctive acoustic features in sperm whale codas , 2011, Animal Behaviour.

[66]  R. Berwick,et al.  Songs to syntax: the linguistics of birdsong , 2011, Trends in Cognitive Sciences.

[67]  Luke Rendell,et al.  Individual vocal production in a sperm whale (Physeter macrocephalus) social unit , 2011 .

[68]  M. Engel,et al.  A quarter of a world away: female humpback whale moves 10 000 km between breeding areas , 2011, Biology Letters.

[69]  C. Scharff,et al.  Twitter evolution: converging mechanisms in birdsong and human speech , 2010, Nature Reviews Neuroscience.

[70]  Mark Johnson,et al.  Using Universal Linguistic Knowledge to Guide Grammar Induction , 2010, EMNLP.

[71]  K. Zuberbühler,et al.  Campbell's monkeys concatenate vocalizations into context-specific call sequences , 2009, Proceedings of the National Academy of Sciences.

[72]  J. Bradbury,et al.  Vocal matching by orange-fronted conures (Aratinga canicularis) , 2009, Behavioural Processes.

[73]  J. Allman,et al.  Total number and volume of Von Economo neurons in the cerebral cortex of cetaceans , 2009, The Journal of comparative neurology.

[74]  Luke Rendell,et al.  Who Cares? Between-group variation in alloparental caregiving in sperm whales , 2009 .

[75]  C. N. Slobodchikoff,et al.  Prairie dog alarm calls encode labels about predator colors , 2009, Animal Cognition.

[76]  Luke Rendell,et al.  Overlapping and matching of codas in vocal interactions between sperm whales: insights into communication function , 2008, Animal Behaviour.

[77]  Mary K. Fagan Mean Length of Utterance before words and grammar: Longitudinal trends and developmental implications of infant vocalizations* , 2008, Journal of Child Language.

[78]  Aristotle,et al.  Aristotle's History of Animals. in Ten Books , 2008 .

[79]  Anna Dornhaus,et al.  Multimodal signals enhance decision making in foraging bumble-bees , 2008, Proceedings of the Royal Society B: Biological Sciences.

[80]  David A. Leavens,et al.  Animal Cognition: Multimodal Tactics of Orangutan Communication , 2007, Current Biology.

[81]  L. Lefebvre,et al.  Cetaceans Have Complex Brains for Complex Cognition , 2007, PLoS biology.

[82]  Clara B. Jones,et al.  Multimodal Communication by Male Mantled Howler Monkeys (Alouatta palliata) in Sexual Contexts: A Descriptive Analysis , 2007, Folia Primatologica.

[83]  Sperm whale feeding variation by location, year, social group and clan: evidence from stable isotopes , 2007 .

[84]  H. Whitehead,et al.  Indications of fitness differences among vocal clans of sperm whales , 2007, Behavioral Ecology and Sociobiology.

[85]  K. Zuberbühler,et al.  The Syntax and Meaning of Wild Gibbon Songs , 2006, PloS one.

[86]  V. Deecke Studying Marine Mammal Cognition in the Wild: A Review of Four Decades of Playback Experiments , 2006 .

[87]  Mark P. Johnson,et al.  Deep-diving foraging behaviour of sperm whales (Physeter macrocephalus). , 2006, The Journal of animal ecology.

[88]  Alaa A. Kharbouch,et al.  Three models for the description of language , 1956, IRE Trans. Inf. Theory.

[89]  P. Hof,et al.  Animal Studies Repository Animal Studies Repository Cortical Complexity in Cetacean Brains , 2022 .

[90]  R. Wanker,et al.  Vocal labelling of family members in spectacled parrotlets, Forpus conspicillatus , 2005, Animal Behaviour.

[91]  J. Poole,et al.  Animal behaviour: Elephants are capable of vocal learning , 2005, Nature.

[92]  Peter L Tyack,et al.  Three-dimensional beam pattern of regular sperm whale clicks confirms bent-horn hypothesis. , 2005, The Journal of the Acoustical Society of America.

[93]  Siobhan Chapman Logic and Conversation , 2005 .

[94]  L. Marino Cetacean brain evolution: Multiplication generates complexity. , 2004 .

[95]  Dan Klein,et al.  Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency , 2004, ACL.

[96]  H. Whitehead,et al.  Movements, habitat use and feeding success of cultural clans of South Pacific sperm whales , 2004 .

[97]  H. Whitehead Sperm Whales: Social Evolution in the Ocean , 2003 .

[98]  P. Madsen,et al.  The monopulsed nature of sperm whale clicks. , 2003, The Journal of the Acoustical Society of America.

[99]  Aniruddh D. Patel,et al.  Language, music, syntax and the brain , 2003, Nature Neuroscience.

[100]  H. Whitehead,et al.  Vocal clans in sperm whales (Physeter macrocephalus) , 2003, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[101]  C. N. Slobodchikoff,et al.  Communication of Stimulus Size and Shape in Alarm Calls of Gunnison’s Prairie Dogs, Cynomys gunnisoni , 1999 .

[102]  Michael D. Scott,et al.  Individual recognition in wild bottlenose dolphins: a field test using playback experiments , 1999, Animal Behaviour.

[103]  P. Kuhl,et al.  Birdsong and human speech: common themes and mechanisms. , 1999, Annual review of neuroscience.

[104]  P. Slater,et al.  Context-specific use suggests that bottlenose dolphin signature whistles are cohesion calls , 1998, Animal Behaviour.

[105]  Lori Marino,et al.  A Comparison of Encephalization between Odontocete Cetaceans and Anthropoid Primates , 1998, Brain, Behavior and Evolution.

[106]  H. Whitehead,et al.  Group-specific dialects and geographical variation in coda repertoire in South Pacific sperm whales , 1997, Behavioral Ecology and Sociobiology.

[107]  P. Slater,et al.  Vocal Learning in Mammals , 1997 .

[108]  Rufus A. Johnstone,et al.  Multiple Displays in Animal Communication:`Backup Signals' and `Multiple Messages' , 1996 .

[109]  H. Whitehead,et al.  Coda communication by sperm whales (Physeter macrocephalus) off the Galápagos Islands , 1993 .

[110]  Peter K. McGregor,et al.  Design of Playback Experiments: The Thornbridge Hall NATO ARW Consensus , 1992 .

[111]  H. Giles,et al.  Accommodation theory: Communication, context, and consequence. , 1991 .

[112]  L. Petitto,et al.  Babbling in the manual mode: evidence for the ontogeny of language. , 1991, Science.

[113]  I. Pepperberg Cognition in an African gray parrot (Psittacus erithacus): Further evidence for comprehension of categories and labels. , 1990 .

[114]  P. Tyack,et al.  Population biology, social behavior and communication in whales and dolphins. , 1986, Trends in ecology & evolution.

[115]  S. Savage-Rumbaugh,et al.  Language learning in two species of apes , 1985, Neuroscience & Biobehavioral Reviews.

[116]  John H. Steele,et al.  A comparison of terrestrial and marine ecological systems , 1985, Nature.

[117]  B. Repp Categorical Perception: Issues, Methods, Findings , 1984 .

[118]  P. Marler,et al.  Vervet monkey alarm calls: Semantic communication in a free-ranging primate , 1980, Animal Behaviour.

[119]  F. Patterson The gestures of a gorilla: Language acquisition in another pongid , 1978, Brain and Language.

[120]  William A. Watkins,et al.  Sperm whale codas , 1977 .

[121]  E. Schegloff,et al.  A simplest systematics for the organization of turn-taking for conversation , 1974 .

[122]  L. V. Worthington,et al.  Underwater Sounds heard from Sperm Whales , 1957, Nature.

[123]  Claude E. Shannon,et al.  Prediction and Entropy of Printed English , 1951 .

[124]  M. C. Baker Bird Song Research : The Past 100 Years , 2022 .