A multimodal temporal panorama approach for moving vehicle detection, reconstruction, and classification

Moving vehicle detection and classification using multimodal data is a challenging task in data collection, audio-visual alignment, data labeling and feature selection under uncontrolled environments with occlusions, motion blurs, varying image resolutions and perspective distortions. In this work, we propose an effective multimodal temporal panorama approach for the task using a novel long-range audio-visual sensing system. A new audio-visual vehicle (AVV) dataset for moving vehicle detection and classification is created, which features automatic vehicle detection and audio-visual alignment, accurate vehicle extraction and reconstruction, and efficient data labeling. In particular, vehicles' visual images are reconstructed once detected in order to remove most of the occlusions, motion blurs, and variations of perspective views. Multimodal audio-visual features are extracted, including global geometric features (aspect ratios, profiles), local structure features (HOGs), as well various audio features (MFCCs, etc). Using radial-based SVMs, the effectiveness of the integration of these multimodal features is thoroughly and systemically studied. The concept of MTP may not be only limited to visual, motion and audio modalities; it could also be applicable to other sensing modalities that can obtain data in the temporal domain.

[1]  Thomas Fang Zheng,et al.  Comparison of different implementations of MFCC , 2001, Journal of Computer Science and Technology.

[2]  Lie Lu,et al.  Content analysis for audio classification and segmentation , 2002, IEEE Trans. Speech Audio Process..

[3]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[4]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[5]  Wei Zhang,et al.  Object class recognition using multiple layer boosting with heterogeneous features , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[6]  Tao Wang,et al.  Active stereo vision for improving long range hearing using a Laser Doppler Vibrometer , 2011, 2011 IEEE Workshop on Applications of Computer Vision (WACV).

[7]  Osama Masoud,et al.  Detection and classification of vehicles , 2002, IEEE Trans. Intell. Transp. Syst..

[8]  Bo Yang,et al.  VISATRAM: a real-time vision system for automatic traffic monitoring , 2000, Image Vis. Comput..

[9]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[10]  Tao Wang,et al.  A multimodal temporal panorama approach for moving vehicle detection, reconstruction and classification , 2013, Comput. Vis. Image Underst..

[11]  Tao Wang,et al.  Real time moving vehicle detection and reconstruction for improving classification , 2012, 2012 IEEE Workshop on the Applications of Computer Vision (WACV).

[12]  Tao Wang,et al.  Vision-Aided Laser Doppler Vibrometry for Remote Automatic Voice Detection , 2011, IEEE/ASME Transactions on Mechatronics.

[13]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[14]  Harpreet S. Sawhney,et al.  Vehicle detection and tracking in wide field-of-view aerial video , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15]  Manuele Bicego,et al.  Audio-Visual Event Recognition in Surveillance Video Sequences , 2007, IEEE Transactions on Multimedia.

[16]  Allen R. Hanson,et al.  Mosaic generation for under vehicle inspection , 2002, Sixth IEEE Workshop on Applications of Computer Vision, 2002. (WACV 2002). Proceedings..

[17]  Guohui Zhang,et al.  Video-Based Vehicle Detection and Classification System for Real-Time Traffic Data Collection Using Uncalibrated Video Cameras , 2007, Transportation Research Record: Journal of the Transportation Research Board.

[18]  Magnús Snorrason,et al.  Vibrometry classification of moving vehicles using throttle signature analysis , 2007, 2007 IEEE International Conference on Systems, Man and Cybernetics.

[19]  A. Enis Çetin,et al.  Surveillance Using Both Video and Audio , 2008, Multimodal Processing and Interaction.

[20]  Jun-Wei Hsieh,et al.  An Automatic Traffic Surveillance System for Vehicle Tracking and Classification , 2003, SCIA.

[21]  Rama Chellappa,et al.  Vehicle detection and tracking using acoustic and video sensors , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[22]  Ling Mao,et al.  Preceding vehicle detection using Histograms of Oriented Gradients , 2010, 2010 International Conference on Communications, Circuits and Systems (ICCCAS).

[23]  Saburo Tsuji,et al.  Panoramic representation of scenes for route understanding , 1990, [1990] Proceedings. 10th International Conference on Pattern Recognition.

[24]  Tao Wang,et al.  Multimodal Temporal Panorama for Moving Vehicle Detection and Reconstruction , 2011, 2011 IEEE International Symposium on Multimedia.