Recognizing Bird Species in Audio Files Using Transfer Learning

In this paper, a method to identify bird species in audio recordings is presented. For this purpose, a pre-trained Inception-v3 convolutional neural network was used. The network was fine-tuned on 36,492 audio recordings representing 1,500 bird species in the context of the BirdCLEF 2017 task. Audio records were transformed into spectrograms and further processed by applying bandpass filtering, noise filtering, and silent region removal. For data augmentation purposes, time shifting, time stretching, pitch shifting, and pitch stretching were applied. This paper shows that fine-tuning a pre-trained convolutional neural network performs better than training a neural network from scratch. Domain adaptation from image to audio domain could be successfully applied. The networks’ results were evaluated in the BirdCLEF 2017 task and achieved an official mean average precision (MAP) score of 0.567 for traditional records and a MAP score of 0.496 for records with background species on the test dataset.

[1]  Ivan Laptev,et al.  Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[3]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[4]  Thomas Hofmann,et al.  Audio Based Bird Species Identification using Deep Learning Techniques , 2016, CLEF.

[5]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Jont B. Allen,et al.  Short term spectral analysis, synthesis, and modification by discrete Fourier transform , 1977 .

[7]  Xiaoli Z. Fern,et al.  Time-frequency segmentation of bird song in noisy acoustic environments , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Mario Lasseck Improving Bird Identification using Multiresolution Template Matching and Feature Selection during Training , 2016, CLEF.

[9]  Bálint Tóth,et al.  Convolutional Neural Networks for Large-Scale Bird Song Classification in Noisy Environment , 2016, CLEF.

[10]  Andreas Rauber,et al.  LifeCLEF Bird Identification Task 2017 , 2017, CLEF.

[11]  Hervé Glotin,et al.  LifeCLEF Bird Identification Task 2016: The arrival of Deep learning , 2016, CLEF.

[12]  Jonathon S. Hare,et al.  OpenIMAJ and ImageTerrier: Java libraries and tools for scalable multimedia analysis and indexing of images , 2011, MM '11.

[13]  Hervé Glotin,et al.  Bag of MFCC-based Words for Bird Identification , 2016, CLEF.

[14]  Karol J. Piczak Recognizing Bird Species in Audio Recordings using Deep Convolutional Neural Networks , 2016, CLEF.

[15]  Hervé Glotin,et al.  LifeCLEF 2017 Lab Overview: Multimedia Species Identification Challenges , 2017, CLEF.

[16]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[17]  Mario Lasseck Bird Song Classification in Field Recordings: Winning Solution for NIPS4B 2013 Competition * , 2013 .

[18]  John D. Austin,et al.  Adaptive histogram equalization and its variations , 1987 .

[19]  Azriel Rosenfeld,et al.  Computer Vision , 1988, Adv. Comput..