Hear-and-avoid for UAVs using convolutional neural networks

To investigate how an Unmanned Air Vehicle (UAV) can detect manned aircraft with a single microphone, an audio data set is created in which UAV ego-sound and recorded aircraft sound are mixed together. A convolutional neural network is used to perform the air traffic detection. Due to restrictions on flying UAVs close to aircraft, the data set has to be artificially produced, so the UAV sound is captured separately from the aircraft sound. They are then mixed with UAV recordings, during which labels are given indicating whether the mixed recording contains aircraft audio or not. The model is a CNN which uses the features MFCC, spectrogram or Mel spectrogram as input. For each feature the effect of UAV/aircraft amplitude ratio, the type of labeling, the window length and the addition of third party aircraft sound database recordings is explored. The results show that the best performance is achieved using the Mel spectrogram feature. The performance increases when the UAV/aircraft amplitude ratio is decreased, when the time window is increased or when the data set is extended with aircraft audio recordings from a third party sound database. Although the currently presented approach has a number of false positives and false negatives that is still too high for real-world application, this study indicates multiple paths forward that can lead to an interesting performance. In addition, the data set is provided as open access, allowing the community to contribute to the improvement of the detection task.

[1]  张国亮,et al.  Comparison of Different Implementations of MFCC , 2001 .

[2]  Dario Floreano,et al.  On-Board Relative Bearing Estimation for Teams of Drones Using Sound , 2016, IEEE Robotics and Automation Letters.

[3]  Beth Logan,et al.  Mel Frequency Cepstral Coefficients for Music Modeling , 2000, ISMIR.

[4]  Richard F. Lyon,et al.  Machine Hearing: An Emerging Field , 2010 .

[5]  Tara N. Sainath,et al.  Convolutional, Long Short-Term Memory, fully connected Deep Neural Networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Huy Phan,et al.  Robust Audio Event Recognition with 1-Max Pooling Convolutional Neural Networks , 2016, INTERSPEECH.

[7]  Dario Floreano,et al.  Robust acoustic source localization of emergency signals from Micro Air Vehicles , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[8]  Dario Floreano,et al.  Audio-based Relative Positioning System for Multiple Micro Air Vehicle Systems , 2013, Robotics: Science and Systems.

[9]  Yan Song,et al.  Robust sound event recognition using convolutional neural networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Colin Raffel,et al.  librosa: Audio and Music Signal Analysis in Python , 2015, SciPy.

[11]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[12]  Heikki Huttunen,et al.  Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[13]  Brendan Harvey,et al.  Acoustic Detection of a Fixed-Wing UAV , 2018 .

[14]  Dario Floreano,et al.  Audio-based localization for swarms of micro air vehicles , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[15]  H. D. Bree,et al.  Acoustic Vector Sensors on Small Unmanned Air Vehicles , 2012 .

[16]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[17]  Patrick Marmaroli,et al.  A UAV motor denoising technique to improve localization of surrounding noisy aircrafts: proof of concept for anti-collision systems , 2012 .

[18]  Ian McLoughlin,et al.  Continuous robust sound event classification using time-frequency features and deep learning , 2017, PloS one.