Deep convolutional network for animal sound classification and source attribution using dual audio recordings.

This paper introduces an end-to-end feedforward convolutional neural network that is able to reliably classify the source and type of animal calls in a noisy environment using two streams of audio data after being trained on a dataset of modest size and imperfect labels. The data consists of audio recordings from captive marmoset monkeys housed in pairs, with several other cages nearby. The network in this paper can classify both the call type and which animal made it with a single pass through a single network using raw spectrogram images as input. The network vastly increases data analysis capacity for researchers interested in studying marmoset vocalizations, and allows data collection in the home cage, in group housed animals.

[1]  E. Schegloff,et al.  A simplest systematics for the organization of turn-taking for conversation , 2015 .

[2]  Cory T. Miller,et al.  The communicative content of the common marmoset phee call during antiphonal calling , 2010, American journal of primatology.

[3]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[4]  Sidarta Ribeiro,et al.  Machine Learning Algorithms for Automatic Classification of Marmoset Vocalizations , 2016, PloS one.

[5]  Y. Yovel,et al.  Everyday bat vocalizations contain information about emitter, addressee, context, and behavior , 2016, Scientific Reports.

[6]  François Pachet,et al.  Finding good acoustic features for parrot vocalizations: the feature generation approach. , 2011, The Journal of the Acoustical Society of America.

[7]  Hiroshi Riquimaroux,et al.  Classification of vocalizations in the Mongolian gerbil, Meriones unguiculatus. , 2012, The Journal of the Acoustical Society of America.

[8]  David A. Leopold,et al.  Marmosets: A Neuroscientific Model of Human Social Behavior , 2016, Neuron.

[9]  Christophe Boesch,et al.  Acoustic structure and variation in mountain and western gorilla close calls: a syntactic approach , 2014 .

[10]  Francisco Torreira,et al.  Timing in turn-taking and its implications for processing models of language , 2015, Front. Psychol..

[11]  Lars Lundberg,et al.  Classifying environmental sounds using image recognition networks , 2017, KES.

[12]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[13]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[14]  J. Soltis,et al.  The vocal repertoire of the Key Largo woodrat (Neotoma floridana smalli). , 2012, The Journal of the Acoustical Society of America.

[15]  G. Epple,et al.  Comparative studies on vocalization in marmoset monkeys (Hapalidae). , 1968, Folia primatologica; international journal of primatology.

[16]  Zhen-Hua Ling,et al.  Automatic detection and classification of marmoset vocalizations using deep and recurrent neural networks. , 2018, The Journal of the Acoustical Society of America.

[17]  G. Bourne,et al.  Quantitative acoustic analysis of the vocal repertoire of the golden rocket frog (Anomaloglossus beebei). , 2012, The Journal of the Acoustical Society of America.

[18]  Cory T. Miller,et al.  Marmoset vocal communication: Behavior and neurobiology , 2017, Developmental neurobiology.

[19]  R. Desimone,et al.  Opportunities and challenges in modeling human brain disorders in transgenic primates , 2022 .

[20]  James L. Fuller The vocal repertoire of adult male blue monkeys (Cercopithecus mitis stulmanni): A quantitative analysis of acoustic structure , 2014, American journal of primatology.

[21]  Chia-Jung Chang,et al.  A quantitative acoustic analysis of the vocal repertoire of the common marmoset (Callithrix jacchus). , 2015, The Journal of the Acoustical Society of America.

[22]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[23]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[24]  B. Bezerra,et al.  Structure and Usage of the Vocal Repertoire of Callithrix jacchus , 2008, International Journal of Primatology.

[25]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).