Study of Robust and Intelligent Surveillance in Visible and Multi-modal Framework

This paper gives a review of current state of the art in the development of robust and intelligent surveillance systems, going beyond traditional vision based framework to more advanced multi-modal framework. The goal of automated surveillance system is to assist the human operator in scene analysis and event classification by automatically detecting the objects and analyzing their behavior using computer vision, pattern recognition and signal processing techniques. This review addresses several advancements made in these fields while bringing out the fact that realizing a practical end to end surveillance system still remains a difficult task due to several challenges faced in a real world scenario. With the advancement in sensor and computing technology, it is now economically and technically feasible to adopt multi-camera and multi-modal framework to meet the need of efficient surveillance system in wide range of security applications like security guard for communities and important buildings, traffic surveillance in cities and military applications. Therefore our review includes significant discussion on multi-modal data fusion approach for robust operation. Finally we conclude with discussion on possible future research directions. Povzetek: Opisane so moderne robustne metode inteligentnega nadzora.

[1]  Jun Li SPATIAL QUALITY EVALUATION OF FUSION OF DIFFERENT RESOLUTION IMAGES , 2010 .

[2]  Christopher Nwagboso User focused Surveillance Systems Integration for Intelligent Transport Systems , 1999 .

[3]  Kevin P. Murphy,et al.  Dynamic Bayesian Networks for Audio-Visual Speech Recognition , 2002, EURASIP J. Adv. Signal Process..

[4]  Andrew David Marshall,et al.  A data fusion system for object recognition based on transferable belief models and kalman filters , 2004 .

[5]  Yuan-Fang Wang,et al.  Real-time multiperson tracking in video surveillance , 2003, Fourth International Conference on Information, Communications and Signal Processing, 2003 and the Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003 Joint.

[6]  Carlo S. Regazzoni,et al.  Content-based retrieval and real time detection from video sequences acquired by surveillance systems , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[7]  R. Cucchiara Multimedia surveillance systems , 2005, VSSN@MM.

[8]  Nipun Kwatra,et al.  A Framework for Activity Recognition and Detection of Unusual Activities , 2004, ICVGIP.

[9]  Chris Stauffer,et al.  Automated Audio-visual Activity Analysis , 2005 .

[10]  Tiziana D'Orazio,et al.  Human Activity Recognition in Archaeological Sites by Hidden Markov Models , 2004, PCM.

[11]  Trevor Darrell,et al.  Multiple person and speaker activity tracking with a particle filter , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Sergio A. Velastin,et al.  Intelligent distributed surveillance systems: a review , 2005 .

[13]  Luke Fletcher,et al.  An adaptive fusion architecture for target tracking , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[14]  Christopher Nw Agboso User focused Surveillance Systems Integration for Intelligent Transport Systems , 1999 .

[15]  Alan M. McIvor,et al.  Background Subtraction Techniques , 2000 .

[16]  Sharath Pankanti,et al.  Appearance models for occlusion handling , 2006, Image Vis. Comput..

[17]  Tieniu Tan,et al.  A survey on visual surveillance of object motion and behaviors , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[18]  Aaron F. Bobick,et al.  A State-Based Approach to the Representation and Recognition of Gesture , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Alan F. Smeaton,et al.  Fusion of infrared and visible spectrum video for indoor surveillance , 2005 .

[20]  Atsushi Nakazawa,et al.  Human tracking using distributed vision systems , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[21]  Rama Chellappa,et al.  Human and machine recognition of faces: a survey , 1995, Proc. IEEE.

[22]  Mohan S. Kankanhalli,et al.  Information assimilation framework for event detection in multimedia surveillance systems , 2006, Multimedia Systems.

[23]  Howard D. Wactlar,et al.  Combining motion segmentation with tracking for activity analysis , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[24]  Dean A. Scribner,et al.  Image fusion for tactical applications , 1998, Optics & Photonics.

[25]  James W. Davis,et al.  Robust detection of people in thermal imagery , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[26]  Masahiko Yachida,et al.  Multiple-view-based tracking of multiple humans , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[27]  Alberto Broggi,et al.  Pedestrian detection in infrared images , 2003, IEEE IV2003 Intelligent Vehicles Symposium. Proceedings (Cat. No.03TH8683).

[28]  Sameer Singh,et al.  Video analysis of human dynamics - a survey , 2003, Real Time Imaging.

[29]  Joel H. Blatt,et al.  Undersea object detection and recognition: the use of spatially and temporally varying coherent illumination , 1999, Oceans '99. MTS/IEEE. Riding the Crest into the 21st Century. Conference and Exhibition. Conference Proceedings (IEEE Cat. No.99CH37008).

[30]  Xia Liu,et al.  Pedestrian detection and tracking with night vision , 2005, IEEE Transactions on Intelligent Transportation Systems.

[31]  Neil J. Gordon,et al.  A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking , 2002, IEEE Trans. Signal Process..

[32]  Shraga Shoval,et al.  Computerized obstacle avoidance systems for the blind and visually impaired , 2001 .

[33]  Lawrence B. Wolff,et al.  Tracking human faces in infrared video , 2003, Image Vis. Comput..

[34]  Harriet J. Nock,et al.  Assessing face and speech consistency for monologue detection in video , 2002, MULTIMEDIA '02.

[35]  Galina L. Rogova,et al.  Reliability In Information Fusion : Literature Survey , 2004 .

[36]  Massimo Piccardi,et al.  Background subtraction techniques: a review , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[37]  Alex Pentland,et al.  Pfinder: Real-Time Tracking of the Human Body , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[38]  Renate Sitte,et al.  Comparison of techniques for environmental sound recognition , 2003, Pattern Recognit. Lett..

[39]  P. Smets,et al.  Assessing sensor reliability for multisensor data fusion within the transferable belief model , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[40]  Zhigang Zhu,et al.  Integrating LDV Audio and IR Video for Remote Multimodal Surveillance , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[41]  James W. Davis,et al.  Real-time recognition of activity using temporal templates , 1996, Proceedings Third IEEE Workshop on Applications of Computer Vision. WACV'96.

[42]  Vladimir Pavlovic,et al.  Visual Interpretation of Hand Gestures for Human-Computer Interaction: A Review , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[43]  Michal Irani,et al.  Detecting Irregularities in Images and in Video , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[44]  Stephen J. Maybank,et al.  Visual Surveillance for Moving Vehicles , 1998, International Journal of Computer Vision.

[45]  Albert J. Ahumada,et al.  Sensor fusion for synthetic vision , 1991 .

[46]  David A. Whitney,et al.  AUTOALERT: AUTOMATED ACOUSTIC DETECTION OF INCIDENTS , 1995 .

[47]  Samy Bengio,et al.  Semi-supervised adapted HMMs for unusual event detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[48]  Alexander H. Waibel,et al.  Towards Unrestricted Lip Reading , 2000, Int. J. Pattern Recognit. Artif. Intell..

[49]  Harriet J. Nock,et al.  Speaker Localisation Using Audio-Visual Synchrony: An Empirical Study , 2003, CIVR.

[50]  Shih-Schon Lin Review: Extending Visible Band Computer Vision Techniques to Infrared Band Images , 2001 .

[51]  Bir Bhanu,et al.  Human Activity Recognition in Thermal Infrared Imagery , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[52]  Clemente Ibarra-Castanedo,et al.  Advanced surveillance systems: combining video and thermal imagery for pedestrian detection , 2004, SPIE Defense + Commercial Sensing.

[53]  Rita Cucchiara,et al.  Using computer vision techniques for dangerous situation detection in domotic applications , 2004 .

[54]  Philippe Smets,et al.  Data association in multi‐target detection using the transferable belief model , 2001, Int. J. Intell. Syst..

[55]  P. J. Escamilla-Ambrosio,et al.  A hybrid Kalman filter-fuzzy logic architecture for multisensor data fusion , 2001, Proceeding of the 2001 IEEE International Symposium on Intelligent Control (ISIC '01) (Cat. No.01CH37206).

[56]  Damian M. Lyons,et al.  Visual Surveillance in Retail Stores and in the Home , 2002 .

[57]  S. Iyengar,et al.  Multi-Sensor Fusion: Fundamentals and Applications With Software , 1997 .

[58]  James W. Davis,et al.  Fusion-Based Background-Subtraction using Contour Saliency , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[59]  Michael J. Brooks,et al.  Issues in Automated Visual Surveillance , 2003, DICTA.

[60]  Alan F. Smeaton,et al.  Multispectral Object Segmentation and Retrieval in Surveillance Video , 2006, 2006 International Conference on Image Processing.

[61]  Bir Bhanu,et al.  Guest editorial: Special issue on computer vision beyond the visible spectrum , 2003, Image Vis. Comput..

[62]  Noel E. O'Connor,et al.  Comparison of Fusion Methods for Thermo-Visual Surveillance Tracking , 2006, 2006 9th International Conference on Information Fusion.

[63]  Carlo S. Regazzoni,et al.  Introduction to the special issue on video object processing for surveillance applications , 2005, Real Time Imaging.

[64]  James E. Black,et al.  A novel method for video tracking performance evaluation , 2003 .

[65]  Larry S. Davis,et al.  W/sup 4/: Who? When? Where? What? A real time system for detecting and tracking people , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[66]  Jean-Marc Odobez,et al.  Audio-visual speaker tracking with importance particle filters , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[67]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[68]  R. Oka,et al.  Recognition of dexterous manipulations from time-varying images , 1994, Proceedings of 1994 IEEE Workshop on Motion of Non-rigid and Articulated Objects.

[69]  Jun Li SPATIAL QUALITY EVALUATION OF FUSION OF DIFFERENT RESOLUTION IMAGES , 2000 .

[70]  Min Chen,et al.  Semantic event detection via multimodal data mining , 2006, IEEE Signal Processing Magazine.

[71]  Pramod K. Varshney,et al.  Sensor Fusion for Video Surveillance , 2004 .

[72]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[73]  Alexander Toet,et al.  Hierarchical image fusion , 1990, Machine Vision and Applications.

[74]  J. Movellan Tutorial on Hidden Markov Models , 2006 .

[75]  David J. Fleet,et al.  Performance of optical flow techniques , 1994, International Journal of Computer Vision.

[76]  Ankush Mittal,et al.  Fusion of Thermal Infrared and Visible Spectrum Video for Robust Surveillance , 2006, ICVGIP.

[77]  Jürg Kohlas,et al.  Handbook of Defeasible Reasoning and Uncertainty Management Systems , 2000 .

[78]  B. S. Manjunath,et al.  Multisensor Image Fusion Using the Wavelet Transform , 1995, CVGIP Graph. Model. Image Process..

[79]  Nebojsa Jojic,et al.  Audio-visual graphical models for speech processing , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[80]  J. L. Roux An Introduction to the Kalman Filter , 2003 .

[81]  Kentaro Toyama,et al.  Wallflower: principles and practice of background maintenance , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[82]  A. Mittal,et al.  A Multimodal Audio Visible and Infrared Surveillance System (MAVISS) , 2005, 2005 3rd International Conference on Intelligent Sensing and Information Processing.

[83]  Philippe Smets,et al.  The Transferable Belief Model for Quantified Belief Representation , 1998 .

[84]  Dov M. Gabbay,et al.  Handbook of defeasible reasoning and uncertainty management systems: volume 2: reasoning with actual and potential contradictions , 1998 .

[85]  Iwan Ulrich,et al.  The GuideCane-applying mobile robot technologies to assist the visually impaired , 2001, IEEE Trans. Syst. Man Cybern. Part A.

[86]  Takeo Kanade,et al.  A System for Video Surveillance and Monitoring , 2000 .

[87]  W. Eric L. Grimson,et al.  Adaptive background mixture models for real-time tracking , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).