Characterizing Humans on Riemannian Manifolds

In surveillance applications, head and body orientation of people is of primary importance for assessing many behavioral traits. Unfortunately, in this context people are often encoded by a few, noisy pixels so that their characterization is difficult. We face this issue, proposing a computational framework which is based on an expressive descriptor, the covariance of features. Covariances have been employed for pedestrian detection purposes, actually a binary classification problem on Riemannian manifolds. In this paper, we show how to extend to the multiclassification case, presenting a novel descriptor, named weighted array of covariances, especially suited for dealing with tiny image representations. The extension requires a novel differential geometry approach in which covariances are projected on a unique tangent space where standard machine learning techniques can be applied. In particular, we adopt the Campbell-Baker-Hausdorff expansion as a means to approximate on the tangent space the genuine (geodesic) distances on the manifold in a very efficient way. We test our methodology on multiple benchmark datasets, and also propose new testing sets, getting convincing results in all the cases.

[1]  Vittorio Murino,et al.  Look at Who's Talking: Voice Activity Detection by Automated Gesture Analysis , 2011, AmI Workshops.

[2]  Alessio Del Bue,et al.  Social interaction discovery by statistical analysis of F-formations , 2011, BMVC.

[3]  Fatih Murat Porikli,et al.  Region Covariance: A Fast Descriptor for Detection and Classification , 2006, ECCV.

[4]  Jian Yao,et al.  Fast human detection from videos using covariance features , 2008, ECCV 2008.

[5]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[6]  Vittorio Murino,et al.  Multi-class Classification on Riemannian Manifolds for Video Surveillance , 2010, ECCV.

[7]  Jean-Marc Odobez,et al.  Evaluation of Multiple Cue Head Pose Estimation Algorithms in Natural Environements , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[8]  Bernt Schiele,et al.  New features and insights for pedestrian detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[9]  Suresh Venkatasubramanian,et al.  Robust statistics on Riemannian manifolds via the geometric median , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Jean-Marc Odobez,et al.  Learning large margin likelihoods for realtime head pose tracking , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[11]  Hai Tao,et al.  Evaluating Appearance Models for Recognition, Reacquisition, and Tracking , 2007 .

[12]  Larry S. Davis,et al.  A Pose-Invariant Descriptor for Human Detection and Segmentation , 2008, ECCV.

[13]  Mohan M. Trivedi,et al.  Head Pose Estimation in Computer Vision: A Survey , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Andrew Zisserman,et al.  Efficient additive kernels via explicit feature maps , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15]  R. Carter Lie Groups , 1970, Nature.

[16]  Juergen Gall,et al.  Class-specific Hough forests for object detection , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Maja Pantic,et al.  Social signal processing: Survey of an emerging domain , 2009, Image Vis. Comput..

[18]  Fatih Murat Porikli,et al.  Pedestrian Detection via Classification on Riemannian Manifolds , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Ram Nevatia,et al.  Detection and Segmentation of Multiple, Partially Occluded Objects by Grouping, Merging, Assigning Part Detection Responses , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Dariu Gavrila,et al.  Monocular Pedestrian Detection: Survey and Experiments , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Nicholas Ayache,et al.  Geometric Means in a Novel Vector Space Structure on Symmetric Positive-Definite Matrices , 2007, SIAM J. Matrix Anal. Appl..

[22]  Michael I. Jordan,et al.  Mixed Memory Markov Models: Decomposing Complex Stochastic Processes as Mixtures of Simpler Ones , 1999, Machine Learning.

[23]  Luc Van Gool,et al.  Real time head pose estimation with random regression forests , 2011, CVPR 2011.

[24]  Jean-Marc Odobez,et al.  Tracking the Visual Focus of Attention for a Varying Number of Wandering People , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Nicholas Ayache,et al.  Clinical DT-MRI estimation, smoothing and fiber tracking with Log-Euclidean metrics , 2006, 3rd IEEE International Symposium on Biomedical Imaging: Nano to Macro, 2006..

[26]  Cordelia Schmid,et al.  Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[27]  Pietro Perona,et al.  Integral Channel Features , 2009, BMVC.

[28]  Jitendra Malik,et al.  Poselets: Body part detectors trained using 3D human pose annotations , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[29]  Shuicheng Yan,et al.  An HOG-LBP human detector with partial occlusion handling , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[30]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  P. Thomas Fletcher,et al.  Principal geodesic analysis for the study of nonlinear statistics of shape , 2004, IEEE Transactions on Medical Imaging.

[32]  David A. Forsyth,et al.  Improved Human Parsing with a Full Relational Model , 2010, ECCV.

[33]  Dariu Gavrila,et al.  Integrated pedestrian classification and orientation estimation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[34]  Horst Bischof,et al.  Supervised local subspace learning for continuous head pose estimation , 2011, CVPR 2011.

[35]  Horst Bischof,et al.  Using covariance matrices for unsupervised texture segmentation , 2008, 2008 19th International Conference on Pattern Recognition.

[36]  H. Karcher Riemannian center of mass and mollifier smoothing , 1977 .

[37]  Jean-Marc Odobez,et al.  A Cognitive and Unsupervised Map Adaptation Approach to the Recognition of the Focus of Attention from Head Pose , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[38]  Shaogang Gong,et al.  Head Pose Classification in Crowded Scenes , 2009, BMVC.

[39]  Ramakant Nevatia,et al.  Optimizing discrimination-efficiency tradeoff in integrating heterogeneous local features for object detection , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[41]  Anoop Cherian,et al.  Efficient similarity search for covariance matrices via the Jensen-Bregman LogDet Divergence , 2011, 2011 International Conference on Computer Vision.

[42]  Bernt Schiele,et al.  Pictorial structures revisited: People detection and articulated pose estimation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  Bernt Schiele,et al.  A Performance Evaluation of Single and Multi-feature People Detection , 2008, DAGM-Symposium.

[44]  Shuicheng Yan,et al.  Discriminative local binary patterns for human detection in personal album , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[45]  Søren Hauberg,et al.  Manifold Valued Statistics, Exact Principal Geodesic Analysis and the Effect of Linear Approximations , 2010, ECCV.

[46]  I. Chavel Riemannian Geometry: Subject Index , 2006 .

[47]  Greg Mori,et al.  Detecting Pedestrians by Learning Shapelet Features , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[48]  Dan Levi,et al.  Part-Based Feature Synthesis for Human Detection , 2010, ECCV.

[49]  Ankur Agarwal,et al.  A Local Basis Representation for Estimating Human Pose from Cluttered Images , 2006, ACCV.

[50]  Ian D. Reid,et al.  Automatic Reasoning about Causal Events in Surveillance Video , 2011, EURASIP J. Image Video Process..

[51]  Rainer Stiefelhagen,et al.  Head Pose Estimation in Single- and Multi-view Environments - Results on the CLEAR'07 Benchmarks , 2007, CLEAR.

[52]  Larry S. Davis,et al.  Human detection using partial least squares analysis , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[53]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[54]  Ian D. Reid,et al.  Estimating Gaze Direction from Low-Resolution Faces in Video , 2006, ECCV.