Conditional State Space Models for Discriminative Motion Estimation

We consider the problem of predicting a sequence of real-valued multivariate states from a given measurement sequence. Its typical application in computer vision is the task of motion estimation. State Space Models are widely used generative probabilistic models for the problem. Instead of jointly modeling states and measurements, we propose a novel discriminative undirected graphical model which conditions the states on the measurements while exploiting the sequential structure of the problem. The major benefits of this approach are: (1) It focuses on the ultimate prediction task while avoiding probably unnecessary effort in modeling the measurement density, (2) It relaxes generative models' assumption that the measurements are independent given the states, and (3) The proposed inference algorithm takes linear time in the measurement dimension as opposed to the cubic time for Kalman filtering, which allows us to incorporate large numbers of measurement features. We show that the parameter learning can be cast as an instance of convex optimization. We also provide efficient convex optimization methods based on theorems from linear algebra. The performance of the proposed model is evaluated on both synthetic data and the human body pose estimation from silhouette videos.

[1]  J. Engwerda On the existence of a positive definite solution of the matrix equation X + A , 1993 .

[2]  A. Ran,et al.  Necessary and sufficient conditions for the existence of a positive definite solution of the matrix equation X + A*X-1A = Q , 1993 .

[3]  Yaakov Bar-Shalom,et al.  Estimation and Tracking: Principles, Techniques, and Software , 1993 .

[4]  Geoffrey E. Hinton,et al.  Parameter estimation for linear dynamical systems , 1996 .

[5]  Zoubin Ghahramani,et al.  Learning Nonlinear Dynamical Systems Using an EM Algorithm , 1998, NIPS.

[6]  Andrew McCallum,et al.  Maximum Entropy Markov Models for Information Extraction and Segmentation , 2000, ICML.

[7]  Vladimir Pavlovic,et al.  Learning Switching Linear Models of Human Motion , 2000, NIPS.

[8]  Michael Isard,et al.  Learning and Classification of Complex Dynamics , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Rudolph van der Merwe,et al.  The square-root unscented Kalman filter for state and parameter-estimation , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[10]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[11]  Yee Whye Teh,et al.  An Alternate Objective Function for Markovian Fields , 2002, ICML.

[12]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[13]  David J. Fleet,et al.  Robust Online Appearance Models for Visual Tracking , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Xiaojin Zhu,et al.  Kernel conditional random fields: representation and clique selection , 2004, ICML.

[15]  Ahmed M. Elgammal,et al.  Inferring 3D body pose from silhouettes using activity manifold learning , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[16]  Cristian Sminchisescu,et al.  Generative modeling for continuous non-linearly embedded visual inference , 2004, ICML.

[17]  Rui Li,et al.  Articulated Pose Estimation in a Learned Smooth Space of Feasible Solutions , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[18]  Trevor Darrell,et al.  Learning appearance manifolds from video , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[19]  Cristian Sminchisescu,et al.  Conditional Visual Tracking in Kernel Space , 2005, NIPS.

[20]  David J. Fleet,et al.  Priors for people tracking from small training sets , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[21]  Sebastian Thrun,et al.  Discriminative Training of Kalman Filters , 2005, Robotics: Science and Systems.

[22]  Cristian Sminchisescu,et al.  Discriminative density propagation for 3D human motion estimation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[23]  Richard S. Zemel,et al.  Combining discriminative features to infer complex trajectories , 2006, ICML.

[24]  David J. Fleet,et al.  3D People Tracking with Gaussian Process Dynamical Models , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[25]  Trevor Darrell,et al.  Conditional Random People: Tracking Humans with CRFs and Grid Filters , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[26]  Vladimir Pavlovic,et al.  Discriminative Learning of Dynamical Systems for Motion Tracking , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.