A pipeline for identification of bird and frog species in tropical soundscape recordings using a convolutional neural network

Abstract Automated acoustic recorders can collect long-term soundscape data containing species-specific signals in remote environments. Ecologists have increasingly used them for studying diverse fauna around the globe. Deep learning methods have gained recent attention for automating the process of species identification in soundscape recordings. We present an end-to-end pipeline for training a convolutional neural network (CNN) for multi-species multi-label classification of soundscape recordings, starting from raw, unlabeled audio. Training data for species-specific signals are collected using a semi-automated procedure consisting of an efficient template-based signal detection algorithm and a graphical user interface for rapid detection validation. A CNN is then trained based on mel-spectrograms of sound to predict the set of species present in a recording. Transfer learning of a pre-trained model is employed to reduce the necessary training data and time. Furthermore, we define a loss function that allows for using true and false template-based detections to train a multi-class multi-label audio classifier. This approach leverages relevant absence (negative) information in training, and reduces the effort in creating multi-label training data by allowing weak labels. We evaluated the pipeline using a set of soundscape recordings collected across 749 sites in Puerto Rico. A CNN model was trained to identify 24 regional species of birds and frogs. The semi-automated training data collection process greatly reduced the manual effort required for training. The model was evaluated on an excluded set of 1000 randomly sampled 1-min soundscapes from 17 sites in the El Yunque National Forest. The test recordings contained an average of ~3 present target species per recording, and a maximum of 8. The test set also showed a large class imbalance with most species being present in less than 5% of recordings, and others present in >25%. The model achieved a mean-average-precision of 0.893 across the 24 species. Across all predictions, the total average-precision was 0.975.

[1]  Neena Aloysius,et al.  A review on deep convolutional neural networks , 2017, 2017 International Conference on Communication and Signal Processing (ICCSP).

[2]  Juan Lavista Ferres,et al.  Multispecies bioacoustic classification using transfer learning of deep convolutional neural networks with pseudo-labeling , 2020 .

[3]  Thierry Dutoit,et al.  Detection and identification of European woodpeckers with deep convolutional neural networks , 2020, Ecol. Informatics.

[4]  Alex Rogers,et al.  AudioMoth: Evaluation of a smart open acoustic device for monitoring biodiversity and the environment , 2018 .

[5]  Zachary J. Ruff,et al.  Automated identification of avian vocalizations with deep convolutional neural networks , 2019, Remote Sensing in Ecology and Conservation.

[6]  João Gama,et al.  Automatic Classification of Anuran Sounds Using Convolutional Neural Networks , 2016, C3S2E.

[7]  Wen-bin Li,et al.  Audio-only Bird Species Automated Identification Method with Limited Training Data Based on Multi-Channel Deep Convolutional Neural Networks , 2018, ArXiv.

[8]  T. Mitchell Aide,et al.  Real-time bioacoustics monitoring and automated species identification , 2013, PeerJ.

[9]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[10]  Hervé Glotin,et al.  Audio Bird Classification with Inception-v4 extended with Time and Time-Frequency Attention Mechanisms , 2017, CLEF.

[11]  Csaba Sulyok,et al.  Bird Sound Recognition Using a Convolutional Neural Network , 2018, 2018 IEEE 16th International Symposium on Intelligent Systems and Informatics (SISY).

[12]  Emmanuelle Gouillart,et al.  scikit-image: image processing in Python , 2014, PeerJ.

[13]  Stephen Marsland,et al.  Automated birdsong recognition in complex acoustic environments: a review , 2018 .

[14]  E. B. Newman,et al.  A Scale for the Measurement of the Psychological Magnitude Pitch , 1937 .

[15]  Jinghu Yu,et al.  Investigation of Different CNN-Based Models for Improved Bird Sound Classification , 2019, IEEE Access.

[16]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Todor Ganchev,et al.  Computational Bioacoustics: Biodiversity Monitoring and Assessment , 2017 .

[18]  Paul Roe,et al.  Using multi-label classification for acoustic pattern detection and assisting bird species surveys , 2016 .

[19]  Colin Raffel,et al.  librosa: Audio and Music Signal Analysis in Python , 2015, SciPy.

[20]  Hervé Glotin,et al.  LifeCLEF Bird Identification Task 2016: The arrival of Deep learning , 2016, CLEF.

[21]  Travis E. Oliphant,et al.  Python for Scientific Computing , 2007, Computing in Science & Engineering.

[22]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[23]  Chao Wang,et al.  R-CRNN: Region-based Convolutional Recurrent Neural Network for Audio Event Detection , 2018, INTERSPEECH.

[24]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Daniel J. Mennill,et al.  Comparison of manual and automated methods for identifying target sounds in audio recordings of Pileated, Pale-billed, and putative Ivory-billed woodpeckers , 2009 .

[26]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[27]  Klaus Riede,et al.  Automatic bird sound detection in long real-field recordings: Applications and tools , 2014 .