Toward Audio Beehive Monitoring: Deep Learning vs. Standard Machine Learning in Classifying Beehive Audio Samples

Electronic beehive monitoring extracts critical information on colony behavior and phenology without invasive beehive inspections and transportation costs. As an integral component of electronic beehive monitoring, audio beehive monitoring has the potential to automate the identification of various stressors for honeybee colonies from beehive audio samples. In this investigation, we designed several convolutional neural networks and compared their performance with four standard machine learning methods (logistic regression, k-nearest neighbors, support vector machines, and random forests) in classifying audio samples from microphones deployed above landing pads of Langstroth beehives. On a dataset of 10,260 audio samples where the training and testing samples were separated from the validation samples by beehive and location, a shallower raw audio convolutional neural network with a custom layer outperformed three deeper raw audio convolutional neural networks without custom layers and performed on par with the four machine learning methods trained to classify feature vectors extracted from raw audio samples. On a more challenging dataset of 12,914 audio samples where the training and testing samples were separated from the validation samples by beehive, location, time, and bee race, all raw audio convolutional neural networks performed better than the four machine learning methods and a convolutional neural network trained to classify spectrogram images of audio samples. A trained raw audio convolutional neural network was successfully tested in situ on a low voltage Raspberry Pi computer, which indicates that convolutional neural networks can be added to a repertoire of in situ audio classification algorithms for electronic beehive monitoring. The main trade-off between deep learning and standard machine learning is between feature engineering and training time: while the convolutional neural networks required no feature engineering and generalized better on the second, more challenging dataset, they took considerably more time to train than the machine learning methods. To ensure the replicability of our findings and to provide performance benchmarks for interested research and citizen science communities, we have made public our source code and our curated datasets. Dataset: The curated datasets for this article (BUZZ1 and BUZZ2) are publicly available at https://usu.app.box.com/v/BeePiAudioData; Python source code for data capture is publicly available at https://github.com/VKEDCO/PYPL/tree/master/beepi/py/src/29Jan2016; Python source code for the deep learning and machine learning experiments is publicly available at https://github.com/sarba-jit/EBM_Audio_Classification.

[1]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[2]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[3]  David A. Freedman,et al.  Statistical Models: Theory and Practice: References , 2005 .

[4]  Lie Lu,et al.  Music type classification by spectral contrast feature , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[5]  Tara N. Sainath,et al.  Deep convolutional neural networks for LVCSR , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Karol J. Piczak Environmental sound classification with convolutional neural networks , 2015, 2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP).

[7]  Almo Farina,et al.  Ecoacoustics: The Ecological Role of Sounds , 2017 .

[8]  Vladimir A. Kulyukin,et al.  Toward Sustainable Electronic Beehive Monitoring : Algorithms for Omnidirectional Bee Counting from Images and Harmonic Analysis of Buzzing Signals , 2022 .

[9]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[10]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[11]  Nathan Lenssen,et al.  An Introduction to Fourier Analysis with Applications to Music , 2014 .

[12]  Adam Roberts,et al.  Audio Deepdream: Optimizing raw audio with convolutional networks , 2016 .

[13]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[14]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[15]  Sung-Bae Cho,et al.  A probabilistic multi-class strategy of one-vs.-rest support vector machines for cancer classification , 2008, Neurocomputing.

[16]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[17]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[18]  Luis Pedro Coelho,et al.  Building Machine Learning Systems with Python , 2013 .

[19]  Kevin M Schultz,et al.  The mechanism of flight guidance in honeybee swarms: subtle guides or streaker bees? , 2008, Journal of Experimental Biology.

[20]  S. Ferraria,et al.  Monitoring of swarming sounds in bee hives for early detection of the swarming period , 2008 .

[21]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[22]  N. Altman An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression , 1992 .

[23]  Geoffrey E. Hinton,et al.  On rectified linear units for speech processing , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[24]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[25]  Vladimir A. Kulyukin,et al.  Digitizing Buzzing Signals into A 440 Piano Note Sequences and Estimating Forager Traffic Levels from Images in Solar-Powered , Electronic Beehive Monitoring , .

[26]  Charles R. Johnson,et al.  Norms for vectors and matrices , 1985 .

[27]  B. Frey,et al.  The human splicing code reveals new insights into the genetic determinants of disease , 2015, Science.

[28]  Juan Pablo Bello,et al.  Rethinking Automatic Chord Recognition with Convolutional Neural Networks , 2012, 2012 11th International Conference on Machine Learning and Applications.

[29]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[30]  Tara N. Sainath,et al.  Learning the speech front-end with raw waveform CLDNNs , 2015, INTERSPEECH.

[31]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[33]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[34]  Brendan J. Frey,et al.  Deep learning of the tissue-regulated splicing code , 2014, Bioinform..

[35]  Martin Bencsik,et al.  Long-term trends in the honeybee ‘whooping signal’ revealed by automated detection , 2017, PloS one.

[36]  Benjamin Schrauwen,et al.  Deep content-based music recommendation , 2013, NIPS.

[37]  Antonio Torralba,et al.  SoundNet: Learning Sound Representations from Unlabeled Video , 2016, NIPS.

[38]  Justin Salamon,et al.  A Dataset and Taxonomy for Urban Sound Research , 2014, ACM Multimedia.

[39]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[40]  Richard Cohn Introduction to Neo-Riemannian Theory: A Survey and a Historical Perspective , 1998 .

[41]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .

[42]  David Atauri Mezquida,et al.  Short communication. Platform for bee-hives monitoring based on sound analysis. A perpetual warehouse for swarm's daily activity , 2009 .

[43]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[44]  Nikos Fakotakis,et al.  Comparative Evaluation of Various MFCC Implementations on the Speaker Verification Task , 2007 .

[45]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  W. Meikle,et al.  Application of continuous monitoring of honeybee colonies , 2014, Apidologie.