A trainable system for object detection in images and video sequences

This thesis presents a general, trainable system for object detection in static images and video sequences. The core system finds a certain class of objects in static images of completely unconstrained, cluttered scenes without using motion, tracking, or handcrafted models and without making any assumptions on the scene structure or the number of objects in the scene. The system uses a set of training data of positive and negative example images as input, transforms the pixel images to a Haar wavelet representation, and uses a support vector machine classifier to learn the difference between in-class and out-of-class patterns. To detect objects in out-of-sample images, we do a brute force search over all the subwindows in the image. This system is applied to face, people, and car detection with excellent results. For our extensions to video sequences, we augment the core static detection system in several ways (1) extending the representation to five frames, (2) implementing an approximation to a Kalman filter, and (3) modeling detections in an image as a density and propagating this density through time according to measured features. In addition, we present a real-time version of the system that is currently running in a DaimlerChrysler experimental vehicle. As part of this thesis, we also present a system that, instead of detecting full patterns, uses a component-based approach. We find it to be more robust to occlusions, rotations in depth, and severe lighting conditions for people detection than the full body version. We also experiment with various other representations including pixels and principal components and show results that quantify how the number of features, color, and gray-level affect performance. (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)

[1]  David A. Forsyth,et al.  Body plans , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[2]  Rachid Deriche,et al.  Tracking line segments , 1990, Image Vis. Comput..

[3]  Bob Francis,et al.  Compaq Computer Corp. , 1993 .

[4]  Chris Harris,et al.  Tracking with rigid models , 1993 .

[5]  David C. Hogg Model-based vision: a program to see a walking person , 1983, Image Vis. Comput..

[6]  Paul A. Viola,et al.  A cluster-based statistical model for object detection , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[7]  R. E. Kahn,et al.  Understanding people pointing: the Perseus system , 1995, Proceedings of International Symposium on Computer Vision - ISCV.

[8]  M. Lew,et al.  Optimal supports for image matching , 1996, 1996 IEEE Digital Signal Processing Workshop Proceedings.

[9]  Jitendra Malik,et al.  A real-time computer vision system for measuring traffic parameters , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  Qian Chen,et al.  Face Detection From Color Images Using a Fuzzy Pattern Matching Method , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  I. G. BONNER CLAPPISON Editor , 1960, The Electric Power Engineering Handbook - Five Volume Set.

[12]  Federico Girosi,et al.  Support Vector Machines: Training and Applications , 1997 .

[13]  M. F.,et al.  Bibliography , 1985, Experimental Gerontology.

[14]  Jiri Matas,et al.  Object-detection with a varying number of eigenspace projections , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[15]  Osama Masoud,et al.  Pedestrian tracking from a stationary camera using active deformable models , 1995, Proceedings of the Intelligent Vehicles '95. Symposium.

[16]  Christopher J. C. Burges,et al.  Simplified Support Vector Decision Rules , 1996, ICML.

[17]  Rama Chellappa,et al.  Estimation of Object Motion Parameters from Noisy Images , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Thomas S. Huang,et al.  Pattern detection with information-based maximum discrimination and error bootstrapping , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[19]  Thomas S. Huang,et al.  Human face detection in a scene , 1993, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Thorsten Joachims,et al.  Text categorization with support vector machines , 1999 .

[21]  Parag A. Pathak,et al.  Massachusetts Institute of Technology , 1964, Nature.

[22]  David Casasent,et al.  Quadratic Gabor filters for object detection , 2001, IEEE Trans. Image Process..

[23]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[24]  Roberto Cipolla,et al.  Detection of human faces under scale, orientation and viewpoint variations , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[25]  A. Murat Tekalp,et al.  Face detection and facial feature extraction using color, shape and symmetry-based cost functions , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[26]  Ulrich Kressel,et al.  Tracking non-rigid, moving objects based on color cluster flow , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[27]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[28]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[29]  Michael S. Lew,et al.  Face detection using local maxima , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[30]  Roberto Cipolla,et al.  A probabilistic framework for perceptual grouping of features for human face detection , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[31]  Hong Yan,et al.  Neural networks for learning human facial features from labeled graph models , 1996, 1996 Australian New Zealand Conference on Intelligent Information Systems. Proceedings. ANZIIS 96.

[32]  W. Eric L. Grimson,et al.  Configuration based scene classification and image indexing , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[33]  Yoshiki Kobayashi,et al.  Detection of objects including persons using image processing , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[34]  Kah Kay Sung,et al.  Learning and example selection for object and pattern detection , 1995 .

[35]  N E Manos,et al.  Stochastic Models , 1960, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[36]  Michael C. Burl,et al.  Finding faces in cluttered scenes using random labeled graph matching , 1995, Proceedings of IEEE International Conference on Computer Vision.

[37]  Tomaso A. Poggio,et al.  Regularization Theory and Neural Networks Architectures , 1995, Neural Computation.

[38]  S. Gorzig,et al.  Steps towards an intelligent vision system for driver assistance in urban traffic , 1997, Proceedings of Conference on Intelligent Transportation Systems.

[39]  Aaron F. Bobick,et al.  Recognition of human body motion using phase space constraints , 1995, Proceedings of IEEE International Conference on Computer Vision.

[40]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[41]  W. T. Illingworth,et al.  Practical guide to neural nets , 1991 .

[42]  PoggioTomaso,et al.  Example-Based Learning for View-Based Human Face Detection , 1998 .

[43]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[44]  Takeo Kanade,et al.  Probabilistic modeling of local appearance and spatial relationships for object recognition , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[45]  Larry S. Davis,et al.  Highway scene analysis in hard real-time , 1997, Proceedings of Conference on Intelligent Transportation Systems.

[46]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[47]  Thomas S. Huang,et al.  Face detection with information-based maximum discrimination , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[48]  白井 良明 MITのArtificial Intelligence Laboratory , 1973 .

[49]  Roberto Cipolla,et al.  Feature-based human face detection , 1997, Image Vis. Comput..

[50]  Thomas Kalinke,et al.  A Texture-based Object Detection and an adaptive Model-based Classification , 1998 .

[51]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[52]  Norbert Krüger,et al.  Determination of face position and pose with a learned representation based on labelled graphs , 1997, Image Vis. Comput..

[53]  Dariu Gavrila,et al.  The Issues , 2011 .

[54]  Shaogang Gong,et al.  Non-intrusive Person Authentication for Access Control by Visual Tracking and Face Recognition , 1997, AVBPA.

[55]  Ioannis Pitas,et al.  Rule-based face detection in frontal views , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[56]  A. Yuille Deformable Templates for Face Recognition , 1991, Journal of Cognitive Neuroscience.

[57]  Yee-Hong Yang,et al.  Human body motion segmentation in a complex scene , 1987, Pattern Recognit..

[58]  Stéphane Mallat,et al.  A Theory for Multiresolution Signal Decomposition: The Wavelet Representation , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[59]  Takeo Kanade,et al.  Neural Network-Based Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[60]  Theodoros Evgeniou,et al.  A TRAINABLE PEDESTRIAN DETECTION SYSTEM , 1998 .

[61]  A. Guarda,et al.  Evolving visual features and detectors , 1998, Proceedings SIBGRAPI'98. International Symposium on Computer Graphics, Image Processing, and Vision (Cat. No.98EX237).

[62]  Rama Chellappa,et al.  Higher order statistical learning for vehicle detection in images , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[63]  David A. Forsyth,et al.  Finding Naked People , 1996, ECCV.

[64]  Federico Girosi,et al.  Training support vector machines: an application to face detection , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[65]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[66]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[67]  Thomas S. Huang,et al.  Object detection using hierarchical MRF and MAP estimation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[68]  James W. Davis,et al.  The representation and recognition of human movement using temporal templates , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[69]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[70]  Alex Pentland,et al.  Pfinder: real-time tracking of the human body , 1996, Other Conferences.

[71]  Susan T. Dumais,et al.  Inductive learning algorithms and representations for text categorization , 1998, CIKM '98.

[72]  R. Neil Braithwaite,et al.  Hierarchical Gabor filters for object detection in infrared images , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[73]  Dariu Gavrila,et al.  Real-time object detection for "smart" vehicles , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[74]  Venu Govindaraju,et al.  Zero crossings of a non-orthogonal wavelet transform for object location , 1995, Proceedings., International Conference on Image Processing.

[75]  James L. Crowley,et al.  Multi-modal tracking of faces for video communications , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[76]  Thomas S. Huang,et al.  Maximum likelihood face detection , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[77]  Uwe Franke,et al.  Fast stereo based object detection for stop&go traffic , 1996, Proceedings of Conference on Intelligent Vehicles.

[78]  E. J. Stollnitz,et al.  Wavelets for Computer Graphics : A Primer , 1994 .

[79]  Margrit Betke,et al.  HIGHWAY SCENE ANALYSIS FROM A MOVING VEHICLE UNDER REDUCED VISIBILITY CONDITIONS , 1998 .

[80]  Alex Pentland,et al.  Probabilistic visual learning for object detection , 1995, Proceedings of IEEE International Conference on Computer Vision.

[81]  Heba M. Lakany,et al.  An Algorithm for Recognising Walkers , 1997, AVBPA.

[82]  Raghuveer M. Rao,et al.  Matched wavelets-their construction, and application to object detection , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[83]  Anuj Mohan Robust object detection in images by components , 1999 .

[84]  G. Wahba Spline models for observational data , 1990 .

[85]  Thomas S. Huang,et al.  Human face detection in a complex background , 1994, Pattern Recognit..

[86]  R. Vaillant,et al.  Original approach for the localisation of objects in images , 1994 .

[87]  Georgios Tziritas,et al.  Face detection in color images using wavelet packet analysis , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[88]  Niels da Vitoria Lobo,et al.  Face detection using templates , 1994, Proceedings of 12th International Conference on Pattern Recognition.

[89]  W.E.L. Grimson,et al.  Training templates for scene classification using a few examples , 1997, 1997 Proceedings IEEE Workshop on Content-Based Access of Image and Video Libraries.

[90]  Christian Wöhler,et al.  Motion-based recognition of pedestrians , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[91]  Christof Koch,et al.  Comparison of feature combination strategies for saliency-based visual attention systems , 1999, Electronic Imaging.

[92]  Anil K. Jain,et al.  Learning the human face concept in black and white images , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[93]  M. Burl,et al.  Face Localization via Shape Statistics , 1995 .

[94]  Alex Pentland,et al.  Pfinder: Real-Time Tracking of the Human Body , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[95]  Yoshiaki Shirai,et al.  Detection of the movements of persons from a sparse sequence of TV images , 1983, Pattern Recognition.

[96]  Tomaso A. Poggio,et al.  Example-Based Object Detection in Images by Components , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[97]  Takeo Kanade,et al.  Human Face Detection in Visual Scenes , 1995, NIPS.

[98]  Pietro Perona,et al.  Recognition of planar object classes , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[99]  E. Micheli-Tzanakou,et al.  Object detection and recognition in a multiresolution representation , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[100]  Karl Rohr,et al.  Incremental recognition of pedestrians from image sequences , 1993, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[101]  Takeo Kanade,et al.  Rotation invariant neural network-based face detection , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[102]  P. Jonathon Phillips Matching pursuit filter design , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 2 - Conference B: Computer Vision & Image Processing. (Cat. No.94CH3440-5).