Shape-based posture and gesture recognition in videos

The recognition of human postures and gestures is considered to be highly relevant semantic information in videos and surveillance systems. We present a new three-step approach to classifying the posture or gesture of a person based on segmentation, classification, and aggregation. A background image is constructed from succeeding frames using motion compensation and shapes of people are segmented by comparing the background image with each frame. We use a modified curvature scale space (CSS) approach to classify a shape. But a major drawback to this approach is its poor representation of convex segments in shapes: Convex objects cannot be represented at all since there are no inflection points. We have extended the CSS approach to generate feature points for both the concave and convex segments of a shape. The key idea is to reflect each contour pixel and map the original shape to a second one whose curvature is the reverse: Strong convex segments in the original shape are mapped to concave segments in the second one and vice versa. For each shape a CSS image is generated whose feature points characterize the shape of a person very well. The last step aggregates the matching results. A transition matrix is defined that classifies possible transitions between adjacent frames, e.g. a person who is sitting on a chair in one frame cannot be walking in the next. A valid transition requires at least several frames where the posture is classified as "standing-up". We present promising results and compare the classification rates of postures and gestures for the standard CSS and our new approach.

[1]  David J. Fleet,et al.  Performance of optical flow techniques , 1994, International Journal of Computer Vision.

[2]  Patrick Bouthemy,et al.  Computation and analysis of image motion: A synopsis of current problems and methods , 1996, International Journal of Computer Vision.

[3]  Sadegh Abbasi,et al.  Robust Recognition based on Fusion of Multiple Shape Descriptors , 2003, SIP.

[4]  Yoshiaki Shirai,et al.  3-D hand posture recognition by training contour variation , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[5]  Sven Loncaric,et al.  A survey of shape analysis techniques , 1998, Pattern Recognit..

[6]  James W. Davis,et al.  The KidsRoom: A Perceptually-Based Interactive and Immersive Story Environment , 1999, Presence.

[7]  Andrew Zisserman,et al.  Feature Based Methods for Structure and Motion Estimation , 1999, Workshop on Vision Algorithms.

[8]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[9]  O. Faugeras Three-dimensional computer vision: a geometric viewpoint , 1993 .

[10]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[11]  Jr. Joseph J. LaViola,et al.  A Survey of Hand Posture and Gesture Recognition Techniques and Technology , 1999 .

[12]  Alex Pentland,et al.  Pfinder: real-time tracking of the human body , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[13]  P. Anandan,et al.  About Direct Methods , 1999, Workshop on Vision Algorithms.

[14]  Josef Kittler,et al.  Enhancing CSS-based shape retrieval for objects with shallow concavities , 2000, Image Vis. Comput..

[15]  Josef Kittler,et al.  Robust and Efficient Shape Indexing through Curvature Scale Space , 1996, BMVC.

[16]  Joachim Weickert,et al.  A tensor-driven active contour model for moving object segmentation , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[17]  Larry S. Davis,et al.  W/sup 4/: Who? When? Where? What? A real time system for detecting and tracking people , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[18]  Olivier D. Faugeras,et al.  3D articulated models and multi-view tracking with silhouettes , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[19]  Berthold K. P. Horn Robot vision , 1986, MIT electrical engineering and computer science series.

[20]  Peter J. Rousseeuw,et al.  Robust Regression and Outlier Detection , 2005, Wiley Series in Probability and Statistics.

[21]  Barry Brumitt,et al.  EasyLiving: Technologies for Intelligent Environments , 2000, HUC.

[22]  J. Ohya,et al.  Real-time estimation of human body posture from monocular thermal images , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[23]  Miroslaw Bober,et al.  Curvature Scale Space Representation: Theory, Applications, and MPEG-7 Standardization , 2011, Computational Imaging and Vision.

[24]  James W. Davis,et al.  Perceptual user interfaces: the KidsRoom , 2000, CACM.

[25]  Josef Kittler,et al.  Efficient and Robust Retrieval by Shape Content through Curvature Scale Space , 1998, Image Databases and Multi-Media Search.

[26]  Rachid Deriche,et al.  Geodesic Active Contours and Level Sets for the Detection and Tracking of Moving Objects , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  Gerald Kühne,et al.  Motion-based segmentation and contour-based classification of video objects , 2001, MULTIMEDIA '01.

[28]  Luciano da Fontoura Costa,et al.  Shape Analysis and Classification: Theory and Practice , 2000 .

[29]  Thomas B. Moeslund,et al.  3D human pose estimation using 2D-Data and an alternative phase space representation , 2000 .

[30]  Gerald Kühne,et al.  Contour-based classification of video objects , 2001, IS&T/SPIE Electronic Imaging.

[31]  Theodosios Pavlidis,et al.  A review of algorithms for shape analysis , 1978 .

[32]  Wolfgang Effelsberg,et al.  Robust clustering-based video-summarization with integration of domain-knowledge , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[33]  Steven S. Beauchemin,et al.  The computation of optical flow , 1995, CSUR.

[34]  PeopleIsmail,et al.  W 4 : Who ? When ? Where ? What ? A Real Time System for Detecting and Tracking , 1998 .