A cloud infrastructure for target detection and tracking using audio and video fusion

This paper presents a Cloud-based architecture for detecting and tracking multiple moving targets from airborne videos combined with the audio assistance, which is called Cloud-based Audio-Video (CAV) fusion. The CAV system innovation is a method for user-based voice-to-text color feature descriptor track matching with an automated hue feature extraction from image pixels. The introduced CAV approach is general purpose for detecting and tracking different valuable targets' movement for suspicious behavior recognition through multi-intelligence data fusion. Using Cloud computing leads to real-time performance as compared a single machine workflow. The obtained multiple moving target tracking results from airborne videos demonstrate that the CAV approach provides improved frame rate, enhanced detection, and real-time tracking and classification performance under realistic conditions.

[1]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[2]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[3]  Li Bai,et al.  Robust infrared vehicle tracking across target pose change using L1 regularization , 2010, 2010 13th International Conference on Information Fusion.

[4]  Patrick Bouthemy,et al.  Computation and analysis of image motion: A synopsis of current problems and methods , 1996, International Journal of Computer Vision.

[5]  Nasser Kehtarnavaz,et al.  Real-time robust vision-based hand gesture recognition using stereo images , 2013, Journal of Real-Time Image Processing.

[6]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[7]  E. Blasch,et al.  Multiresolution EO/IR target tracking and identification , 2005, 2005 7th International Conference on Information Fusion.

[8]  R. Venkatesh Babu,et al.  Robust tracking with interest points: A sparse representation approach , 2015, Image Vis. Comput..

[9]  Eloi Bosse,et al.  High-Level Information Fusion Management and System Design , 2012 .

[10]  Qian Du,et al.  Optical Flow and Principal Component Analysis-Based Motion Detection in Outdoor Videos , 2010, EURASIP J. Adv. Signal Process..

[11]  Erik Blasch,et al.  Multi-source Multi-modal Activity Recognition in Aerial Video Surveillance , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[12]  Nasser Kehtarnavaz,et al.  Comparison of two real-time hand gesture recognition systems involving stereo cameras, depth camera, and inertial sensor , 2014, Photonics Europe.

[13]  Erik Blasch,et al.  Mobile positioning via fusion of mixed signals of opportunity , 2014, IEEE Aerospace and Electronic Systems Magazine.

[14]  Erik Blasch,et al.  A Holistic Cloud-Enabled Robotics System for Real-Time Video Tracking Application , 2014 .

[15]  Gunnar Farnebäck,et al.  Two-Frame Motion Estimation Based on Polynomial Expansion , 2003, SCIA.

[16]  Erik Blasch,et al.  NAECON08 grand challenge entry using the belief filter in audio-video track and ID fusion , 2009, Proceedings of the IEEE 2009 National Aerospace & Electronics Conference (NAECON).

[17]  Yufeng Zheng,et al.  Qualitative and quantitative comparisons of multispectral night vision colorization techniques , 2012 .

[18]  Genshe Chen,et al.  Scalable sentiment classification for Big Data analysis using Naïve Bayes Classifier , 2013, 2013 IEEE International Conference on Big Data.

[19]  Lei Zhang,et al.  Real-Time Compressive Tracking , 2012, ECCV.

[20]  Genshe Chen,et al.  An adaptive process-based cloud infrastructure for space situational awareness applications , 2014, Defense + Security Symposium.

[21]  Rajiv Ranjan,et al.  CloudGenius: decision support for web server cloud migration , 2012, WWW.

[22]  Nasser Kehtarnavaz,et al.  Fusion of Inertial and Depth Sensor Data for Robust Hand Gesture Recognition , 2014, IEEE Sensors Journal.

[23]  Li Bai,et al.  Multiple Kernel Learning for vehicle detection in wide area motion imagery , 2012, 2012 15th International Conference on Information Fusion.

[24]  Erik Blasch,et al.  QuEST for information fusion , 2014, NAECON 2014 - IEEE National Aerospace and Electronics Conference.

[25]  Genshe Chen,et al.  Context aided video-to-text information fusion , 2014, 17th International Conference on Information Fusion (FUSION).

[26]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[27]  Genshe Chen,et al.  Cloud-based space situational awareness: initial design and evaluation , 2013, Defense, Security, and Sensing.

[28]  Erik Blasch,et al.  GRoup IMM Tracking utilizing Track and Identification Fusion , 2001 .

[29]  Weiqiang Wang,et al.  Robust object tracking via multi-task dynamic sparse model , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[30]  Tao Wang,et al.  Audio-Visual Feature Fusion for Vehicles Classification in a Surveillance System , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[31]  Genshe Chen,et al.  Information fusion in a cloud computing era: A systems-level perspective , 2014, IEEE Aerospace and Electronic Systems Magazine.

[32]  Genshe Chen,et al.  A container-based elastic cloud architecture for real-time full-motion video (FMV) target tracking , 2014, 2014 IEEE Applied Imagery Pattern Recognition Workshop (AIPR).

[33]  Qian Du,et al.  A joint optical flow and principal component analysis approach for motion detection , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[34]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[35]  Zhonghai Wang,et al.  Video-based activity analysis using the L1 tracker on VIRAT data , 2013, 2013 IEEE Applied Imagery Pattern Recognition Workshop (AIPR).

[36]  Erik Blasch,et al.  Video observations for cloud activity-based intelligence (VOCABI) , 2014, NAECON 2014 - IEEE National Aerospace and Electronics Conference.

[37]  Erik Blasch,et al.  Summary of tracking and identification methods , 2014, Defense + Security Symposium.

[38]  Tieniu Tan,et al.  A survey on visual surveillance of object motion and behaviors , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[39]  Li Bai,et al.  Multiple source data fusion via sparse representation for robust visual tracking , 2011, 14th International Conference on Information Fusion.

[40]  James Llinas,et al.  High Level Information Fusion (HLIF): Survey of models, issues, and grand challenges , 2012, IEEE Aerospace and Electronic Systems Magazine.

[41]  Xiaonan Luo,et al.  Visual Tracking with Multi-level Dictionary Learning , 2014, 2014 5th International Conference on Digital Home.

[42]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[43]  Erik Blasch,et al.  Fusion of Tracks with Road Constraints , 2008, J. Adv. Inf. Fusion.

[44]  Haibin Ling,et al.  Real time robust L1 tracker using accelerated proximal gradient approach , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[45]  Erik Blasch,et al.  Enhanced air operations using JView for an air-ground fused situation awareness udop , 2013, 2013 IEEE/AIAA 32nd Digital Avionics Systems Conference (DASC).

[46]  Pavel Senin,et al.  Dynamic Time Warping Algorithm Review , 2008 .

[47]  Genshe Chen,et al.  Image quality assessment for performance evaluation of image fusion , 2008, 2008 11th International Conference on Information Fusion.

[48]  Erik Blasch,et al.  Revisiting the JDL model for information exploitation , 2013, Proceedings of the 16th International Conference on Information Fusion.

[49]  Berthold K. P. Horn,et al.  Determining Optical Flow , 1981, Other Conferences.

[50]  Berthold K. P. Horn,et al.  "Determining optical flow": A Retrospective , 1993, Artif. Intell..

[51]  Zheng Liu,et al.  Objective Assessment of Multiresolution Image Fusion Algorithms for Context Enhancement in Night Vision: A Comparative Study , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[52]  Erik Blasch,et al.  Automatic Association of Chats and Video Tracks for Activity Learning and Recognition in Aerial Video Surveillance , 2014, Sensors.

[53]  Miao Liao,et al.  High-Quality Real-Time Stereo Using Adaptive Cost Aggregation and Dynamic Programming , 2006, Third International Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT'06).

[54]  Olga Mendoza-Schrock,et al.  Video image registration evaluation for a layered sensing environment , 2009, Proceedings of the IEEE 2009 National Aerospace & Electronics Conference (NAECON).

[55]  Li Bai,et al.  Efficient Minimum Error Bounded Particle Resampling L1 Tracker With Occlusion Detection , 2013, IEEE Transactions on Image Processing.

[56]  Shuxiao Li,et al.  Evaluation of Feature Detectors and Descriptors for Motion Detection from Aerial Videos , 2014, 2014 22nd International Conference on Pattern Recognition.

[57]  Erik Blasch,et al.  Simultaneous feature-based identification and track fusion , 1998, Proceedings of the 37th IEEE Conference on Decision and Control (Cat. No.98CH36171).