Robust acoustic bird recognition for habitat monitoring with wireless sensor networks

The key solution to study birds in their natural habitat is the continuous survey using wireless sensors networks (WSN). The final objective of this study is to conceive a system for monitoring threatened bird species using audio sensor nodes. The principal feature for their recognition is their sound. The main limitations encountered with this process are environmental noise and energy consumption in sensor nodes. Over the years, a variety of birdsong classification methods has been introduced, but very few have focused to find an adequate one for WSN. In this paper, a tonal region detector (TRD) using sigmoid function is proposed. This approach for noise power estimation offers flexibility, since the slope and the mean of the sigmoid function can be adapted autonomously for a better trade-off between noise overvaluation and undervaluation. Once the tonal regions in the noisy bird sound are detected, the features gammatone teager energy cepstral coefficients (GTECC) post-processed by quantile-based cepstral normalization were extracted from the above signals for classification using deep neural network classifier. Experimental results for the identification of 36 bird species from Tonga lake (northeast of Algeria) demonstrate that the proposed TRD–GTECC feature is highly effective and performs satisfactorily compared to popular front-ends considered in this study. Moreover, recognition performance, noise immunity and energy consumption are considerably improved after tonal region detection, indicating that it is a very suitable approach for the acoustic bird recognition in complex environments with wireless sensor nodes.

[1]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[2]  Alex Pentland,et al.  Social Sensors for Automatic Data Collection , 2008, AMCIS.

[3]  Naoya Wada,et al.  Cepstral gain normalization for noise robust speech recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Dan Stowell,et al.  Automatic large-scale classification of bird sounds is strongly improved by unsupervised feature learning , 2014, PeerJ.

[5]  Richard M. Stern,et al.  Robust signal-to-noise ratio estimation based on waveform amplitude distribution analysis , 2008, INTERSPEECH.

[6]  Petros Maragos,et al.  On amplitude and frequency demodulation using energy operators , 1993, IEEE Trans. Signal Process..

[7]  Mukund Padmanabhan,et al.  A nonlinear unsupervised adaptation technique for speech recognition , 2000, INTERSPEECH.

[8]  Wei Chu,et al.  Noise robust bird song detection using syllable pattern-based hidden Markov models , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Ying Li,et al.  Adaptive energy detection for bird sound detection in complex environments , 2015, Neurocomputing.

[10]  Brian R Glasberg,et al.  Derivation of auditory filter shapes from notched-noise data , 1990, Hearing Research.

[11]  Sven Nordholm,et al.  Noise Estimation Based on Soft Decisions and Conditional Smoothing for Speech Enhancement , 2012, IWAENC.

[12]  M. Houhamdi,et al.  Diurnal behaviour of Ferruginous Duck Aythya nyroca wintering at the El-Kala wetlands (Northeast Algeria) , 2011 .

[13]  Brian Kingsbury,et al.  New types of deep neural network learning for speech recognition and related applications: an overview , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  Chin-Hui Lee,et al.  Exploiting deep neural networks for detection-based speech recognition , 2013, Neurocomputing.

[15]  Oded Ghitza Auditory models and human performance in tasks related to speech coding and speech recognition , 1994 .

[16]  Erick Stattner,et al.  Contributions à l'étude des réseaux sociaux : propagation, fouille, collecte de données , 2012 .

[17]  Dong Yu,et al.  Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  Geoffrey A. Williamson,et al.  Methods for classification of nocturnal migratory bird vocalizations using Pseudo Wigner-Ville Transform , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[19]  Philipos C. Loizou,et al.  A noise-estimation algorithm for highly non-stationary environments , 2006, Speech Commun..

[20]  Diurnal activity budget and breeding ecology of the White-headed Duck Oxyura leucocephala at Lake Tonga (North-east Algeria) , 2013 .

[21]  R. Patterson,et al.  Complex Sounds and Auditory Images , 1992 .

[22]  Jürgen Schmidhuber,et al.  Multi-column deep neural network for traffic sign classification , 2012, Neural Networks.

[23]  T. Irino,et al.  A time-domain, level-dependent auditory filter: The gammachirp , 1997 .

[24]  John Anderson,et al.  Wireless sensor networks for habitat monitoring , 2002, WSNA '02.

[25]  Nicolas Vidot,et al.  Wildlife Assessment Using Wireless Sensor Networks , 2010 .

[26]  Martine Collard,et al.  Acoustic scheme to count bird songs with wireless sensor networks , 2011, 2011 IEEE International Symposium on a World of Wireless, Mobile and Multimedia Networks.

[27]  John H. L. Hansen,et al.  A new perceptually motivated MVDR-based acoustic front-end (PMVDR) for robust automatic speech recognition , 2008, Speech Commun..

[28]  Todor Ganchev,et al.  Audio parameterization with robust frame selection for improved bird identification , 2015, Expert Syst. Appl..

[29]  Dong Yu,et al.  Scalable stacking and learning for building deep architectures , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[30]  Todor Ganchev,et al.  Bird acoustic activity detection based on morphological filtering of the spectrogram , 2015 .

[31]  Lukas Machlica,et al.  Automatic recognition of bird individuals on an open set using as-is recordings , 2016 .

[32]  Keshab K. Parhi,et al.  Novel Variable length Teager Energy Based features for person recognition from their hum , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[33]  Peter Jancovic,et al.  Automatic Detection and Recognition of Tonal Bird Sounds in Noisy Environments , 2011, EURASIP J. Adv. Signal Process..

[34]  M. Laibowitz,et al.  A sensor network for social dynamics , 2006, 2006 5th International Conference on Information Processing in Sensor Networks.

[35]  Israel Cohen,et al.  Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging , 2003, IEEE Trans. Speech Audio Process..

[36]  John H. L. Hansen,et al.  UT-Scope: Towards LVCSR under Lombard effect induced by varying types and levels of noisy background , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[37]  Howard C. Card,et al.  Bird song identification using artificial neural networks and statistical analysis , 1997, CCECE '97. Canadian Conference on Electrical and Computer Engineering. Engineering Innovation: Voyage of Discovery. Conference Proceedings.

[38]  J. F. Kaiser,et al.  On a simple algorithm to calculate the 'energy' of a signal , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[39]  Rongshan Yu A low-complexity noise estimation algorithm based on smoothing of noise power estimation and estimation bias correction , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[40]  Björn W. Schuller,et al.  Audio recognition in the wild: Static and dynamic classification on a real-world database of animal vocalizations , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[41]  Mark A. Bee,et al.  Quantitative description of the vocal repertoire of the territorial olive frog Babina adenopleura from Taiwan , 2016 .

[42]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[43]  Hermann Ney,et al.  Quantile based histogram equalization for noise robust large vocabulary speech recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[44]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[45]  Srinivasan Umesh,et al.  Improved cepstral mean and variance normalization using Bayesian framework , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[46]  Frank Kurth,et al.  Detecting bird sounds in a complex acoustic environment and application to bioacoustic monitoring , 2010, Pattern Recognit. Lett..

[47]  Deborah Estrin,et al.  Habitat monitoring with sensor networks , 2004, CACM.

[48]  Hellen Adams,et al.  Patent and Trademark Office , 2017 .

[49]  H. Brumm The impact of environmental noise on song amplitude in a territorial bird , 2004 .

[50]  John H. L. Hansen,et al.  A comparison of front-end compensation strategies for robust LVCSR under room reverberation and increased vocal effort , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[51]  Deborah Estrin,et al.  Preprocessing in a Tiered Sensor Network for Habitat Monitoring , 2003, EURASIP J. Adv. Signal Process..

[52]  Lewis Girod,et al.  Automated Wildlife Monitoring Using Self-Configuring Sensor Networks Deployed in Natural Habitats , 2007 .

[53]  Ilyas Potamitis,et al.  Unsupervised dictionary extraction of bird vocalisations and new tools on assessing and visualising bird activity , 2015, Ecol. Informatics.

[54]  Richard C. Hendriks,et al.  Improved mmse-based noise PSD tracking using temporal cepstrum smoothing , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[55]  David E. Culler,et al.  A wireless embedded sensor architecture for system-level optimization , 2002 .

[56]  Tara N. Sainath,et al.  Deep convolutional neural networks for LVCSR , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.