论文信息 - Deep Feature Learning for Acoustics-Based Terrain Classification

Deep Feature Learning for Acoustics-Based Terrain Classification

In order for robots to efficiently navigate in real-world environments, they need to be able to classify and characterize terrain for safe navigation. The majority of techniques for terrain classification is predominantly based on using visual features. However, as vision-based approaches are severely affected by appearance variations and occlusions, relying solely on them incapacitates the ability to function robustly in all conditions. In this paper, we propose an approach that uses sound from vehicle-terrain interactions for terrain classification. We present a new convolutional neural network architecture that learns deep features from spectrograms of extensive audio signals, gathered from interactions with various indoor and outdoor terrains. Using exhaustive experiments, we demonstrate that our network significantly outperforms classification approaches using traditional audio features by achieving state of the art performance. Additional experiments reveal the robustness of the network in situations corrupted with varying amounts of white Gaussian noise and that fine-tuning with noise-augmented samples significantly boosts the classification rate. Furthermore, we demonstrate that our network performs exceptionally well even with samples recorded with a low-quality mobile phone microphone that adds substantial amount of environmental noise.

[1] Daniel P. W. Ellis,et al. Classifying Music Audio with Timbral and Chroma Features , 2007, ISMIR.

[2] Ryan Newton,et al. The pothole patrol: using a mobile sensor network for road surface monitoring , 2008, MobiSys '08.

[3] Karl Iagnemma,et al. Vibration-based terrain classification for planetary exploration rovers , 2005, IEEE Transactions on Robotics.

[4] Gary Witus,et al. Terrain characterization and classification with a mobile robot , 2006, J. Field Robotics.

[5] Anthony Stentz,et al. Using sound to classify vehicle-terrain interactions in outdoor environments , 2012, 2012 IEEE International Conference on Robotics and Automation.

[6] Benjamin Schrauwen,et al. Deep content-based music recommendation , 2013, NIPS.

[7] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[8] C.A. Brooks,et al. Self-Supervised Classification for Planetary Rover Terrain Sensing , 2007, 2007 IEEE Aerospace Conference.

[9] Nino Srour,et al. Feature extraction and fusion of acoustic and seismic sensors for target identification , 1997, Defense, Security, and Sensing.

[10] Nitish Srivastava,et al. Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[11] Honglak Lee,et al. Unsupervised feature learning for audio classification using convolutional deep belief networks , 2009, NIPS.

[12] Laura E. Ray,et al. Mobility characterization for autonomous mobile robots using machine learning , 2011, Auton. Robots.

[13] Brijesh Verma,et al. Pattern Recognition Technologies and Applications: Recent Advances , 2008 .

[14] Chidchanok Lursinsap,et al. Very short time environmental sound classification based on spectrogram pattern matching , 2013, Inf. Sci..

[15] Sergios Theodoridis,et al. Violence Content Classification Using Audio Features , 2006, SETN.

[16] Andreas Zell,et al. Vibration-based Terrain Classification Using Support Vector Machines , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[17] Qiang Chen,et al. Network In Network , 2013, ICLR.

[18] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[19] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[20] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[21] George Tzanetakis,et al. Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..