Houston Toad and Other Chorusing Amphibian Species Call Detection Using Deep Learning Architectures

The preservation of species from extinction using automatic animal voice detection from audio recordings is a topic of high interest in bioacoustics. The Houston Toad is an endangered amphibian and researchers of the Biology Department at Texas State University are working on a project to rescue this species. The initial approach used by the researchers is called Automated Recording Device (ARD) that only detects the Houston Toad calls. However, it has shown limited success identifying toad calls. If a species is not a Houston Toad but has a frequency spectrum close to a Houston Toad, then the ARD falsely identifies it as a Houston Toad. Hence, the current ARD solution produces high false-positives. This paper proposes a modified ARD solution to detect not only the Houston Toad, but also the Gulf Coast Toad, ther Crawfish Frog, and the Woodhouse's Toad experimenting deep learning architectures: Recurrent Neural Network (RNN), Convolutional neural network (CNN), Long Short-Term Memory (LSTM), Gated Recurrent Units (GRUs) with dominating audio features such as MFCC, LPC, PLP, Mel-Filterbanks, Spectrogram and takes an attempt to improve the performance of the toad calls identification system with reduced false-postive rate. This work also proposes a new single board computing platform NVIDIA Jetson Nano for on-filed deployment of the model.

[1]  Hanseok Ko,et al.  Convolutional Feature Vectors and Support Vector Machine for Animal Sound Classification , 2018, 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[2]  Xiaoli Z. Fern,et al.  Simultaneous segmentation and classification of bird song using CNN , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Jan Schlüter,et al.  Learning to Pinpoint Singing Voice from Weakly Labeled Examples , 2016, ISMIR.

[4]  Haizhou Li,et al.  On fusion of timbre-motivated features for singing voice detection and singer identification , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Toan H. Vu,et al.  ACOUSTIC SCENE AND EVENT RECOGNITION USING RECURRENT NEURAL NETWORKS , 2016 .

[6]  Franz Pernkopf,et al.  Gated Recurrent Networks applied to Acoustic Scene Classification , 2016, DCASE.

[7]  Tara N. Sainath,et al.  Learning the speech front-end with raw waveform CLDNNs , 2015, INTERSPEECH.

[8]  Grant Potter LibROSA — librosa 0.4.3 documentation , 2016 .

[9]  Damian Valles,et al.  A Mel-Filterbank and MFCC-based Neural Network Approach to Train the Houston Toad Call Detection System Design , 2018, 2018 IEEE 9th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON).

[10]  Shingchern D. You,et al.  Comparative study of singing voice detection methods , 2016, Multimedia Tools and Applications.

[11]  Mark Bush,et al.  Anuran call classification with deep learning , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Qiang Huang,et al.  Convolutional gated recurrent neural network incorporating spatial features for audio tagging , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).