Robust Recognition of Specific Human Behaviors in Crowded Surveillance Video Sequences

We describe a method that can detect specific human behaviors even in crowded surveillance video scenes. Our developed system recognizes specific behaviors based on the trajectories created by detecting and tracking people in a video. It detects people using an HOG descriptor and SVM classifier, and it tracks the regions by calculating the two-dimensional color histograms. Our system identifies several specific human behaviors, such as running and meeting, by analyzing the similarities to the reference trajectory of each behavior. Verification techniques such as backward tracking and calculating optical flows contributed to robust recognition. Comparative experiments showed that our system could track people more robustly than a baseline tracking algorithm even in crowded scenes. Our system precisely identified specific behaviors and achieved first place for detecting running people in the TRECVID 2009 Surveillance Event Detection Task.

[1]  X. Guorong,et al.  Bhattacharyya distance feature selection , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[2]  Chih-Jen Lin,et al.  A tutorial on?-support vector machines , 2005 .

[3]  Juan Carlos Niebles,et al.  A Hierarchical Model of Shape and Appearance for Human Action Classification , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Bernhard Schölkopf,et al.  A tutorial on v-support vector machines , 2005 .

[5]  Ronen Basri,et al.  Actions as Space-Time Shapes , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Gary R. Bradski,et al.  Motion segmentation and pose recognition with motion history gradients , 2000, Proceedings Fifth IEEE Workshop on Applications of Computer Vision.

[7]  Carl Eklund,et al.  National Institute for Standards and Technology , 2009, Encyclopedia of Biometrics.

[8]  Chung-Lin Huang,et al.  Multiview-Based Cooperative Tracking of Multiple Human Objects , 2008, EURASIP J. Image Video Process..

[9]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[10]  C Tomasi,et al.  Shape and motion from image streams: a factorization method. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Vladimir Pavlovic,et al.  A dynamic Bayesian network approach to figure tracking using learned dynamic models , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[13]  W. Eric L. Grimson,et al.  Adaptive background mixture models for real-time tracking , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[14]  Mubarak Shah,et al.  Learning motion patterns in crowded scenes using motion flow field , 2008, 2008 19th International Conference on Pattern Recognition.

[15]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[16]  Berthold K. P. Horn,et al.  Determining Optical Flow , 1981, Other Conferences.

[17]  Chung-Lin Huang,et al.  Multiple Human Objects Tracking in Crowded Scenes , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[18]  J. Ramsay,et al.  Principal components analysis of sampled functions , 1986 .

[19]  Qi Tian,et al.  A ball tracking framework for broadcast soccer video , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[20]  Zhu Li,et al.  Real-time human action recognition by luminance field trajectory analysis , 2008, ACM Multimedia.

[21]  Pascal Fua,et al.  Modeling people: Vision-based understanding of a person's shape, appearance, movement, and behaviour , 2006, Comput. Vis. Image Underst..

[22]  Greg Mori,et al.  Action recognition by learning mid-level motion features , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Guorong Xuan,et al.  Bhattacharyya distance feature selection , 1996, ICPR.

[24]  Xinghua Sun,et al.  Action recognition via local descriptors and holistic features , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[25]  A. Enis Çetin,et al.  Silhouette-Based Method for Object Classification and Human Action Recognition in Video , 2006, ECCV Workshop on HCI.

[26]  P. Mahalanobis On the generalized distance in statistics , 1936 .

[27]  Martial Hebert,et al.  Event Detection in Crowded Videos , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[28]  J. Bentsman,et al.  Robust Industrial Control: Optimal Design Approach for Polynomial Systems [Book Reviews] , 1996, IEEE Transactions on Automatic Control.

[29]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[30]  Matthew C. Stamm,et al.  Live video object tracking and segmentation using graph cuts , 2008, ICIP 2008.

[31]  Krystian Mikolajczyk,et al.  Action recognition with motion-appearance vocabulary forest , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  David Moore,et al.  A real-world system for human motion detection and tracking , 2003 .

[33]  M. Grimble Robust Industrial Control Systems: Optimal Design Approach for Polynomial Systems , 1994 .

[34]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.