Spatiotemporal Alignment of Visual Signals on a Special Manifold

We investigate the problem of spatiotemporal alignment of videos, signals, or feature sequences extracted from them. Specifically, we consider the scenario where the spatiotemporal misalignments can be characterized by parametric transformations. Using a nonlinear analytical structure referred to as an alignment manifold, we formulate the alignment problem as an optimization problem on this nonlinear space. We focus our attention on semantically meaningful videos or signals, e.g., those describing or capturing human motion or activities, and propose a new formalism for temporal alignment accounting for executing rate variations among instances of the same video event. The strategy taken in this effort bridges the family of geometric optimization and the family of stochastic algorithms: We regard the search for optimal alignment parameters as a recursive state estimation problem for a particular dynamic system evolving on the alignment manifold. Subsequently, a Sequential Importance Sampling procedure on the alignment manifold is designed for effective alignment. We further extend the basic Sequential Importance Sampling algorithm into a new version called Stochastic Gradient Sequential Importance Sampling, in which we incorporate a steepest descent structure on the alignment manifold and provide a more efficient particle propagation mechanism. We demonstrate the performance of alignment using manifolds on several types of input data that arise in vision problems.

[1]  Sudeep Sarkar,et al.  The humanID gait challenge problem: data sets, performance, and analysis , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Alan Edelman,et al.  The Geometry of Algorithms with Orthogonality Constraints , 1998, SIAM J. Matrix Anal. Appl..

[3]  Pan Pan,et al.  Regressed Importance Sampling on Manifolds for Efficient Object Tracking , 2009, 2009 Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance.

[4]  Yaron Caspi,et al.  A step towards sequence-to-sequence alignment , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[5]  Kyoung Mu Lee,et al.  Visual tracking via geometric particle filtering on the affine group with optimal importance functions , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Robert Mahony,et al.  Optimization On Manifolds: Methods And Applications , 2010 .

[7]  Rama Chellappa,et al.  Learning multi-modal densities on Discriminative Temporal Interaction Manifold for group activity recognition , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Kiriakos N. Kutulakos,et al.  Linear Sequence-to-Sequence Alignment , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Ronen Basri,et al.  Actions as space-time shapes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[10]  Stephen J. Maybank,et al.  The Fisher-Rao Metric for Projective Transformations of the Line , 2005, International Journal of Computer Vision.

[11]  Yaron Caspi,et al.  Aligning Non-Overlapping Sequences , 2004, International Journal of Computer Vision.

[12]  Hanqing Lu,et al.  Probabilistic tracking on Riemannian manifolds , 2008, 2008 19th International Conference on Pattern Recognition.

[13]  Tanveer F. Syeda-Mahmood,et al.  View-invariant alignment and matching of video sequences , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[14]  Rama Chellappa,et al.  Rate-Invariant Recognition of Humans and Their Activities , 2009, IEEE Transactions on Image Processing.

[15]  Anuj Srivastava,et al.  Riemannian Analysis of Probability Density Functions with Applications in Vision , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  M. Irani,et al.  Spatio-Temporal Alignment of Sequences , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Anuj Srivastava,et al.  Optimal linear representations of images for object recognition , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  P. Absil,et al.  Riemannian Geometry of Grassmann Manifolds with a View on Algorithmic Computation , 2004 .

[19]  Lily Lee,et al.  Monitoring Activities from Multiple Video Streams: Establishing a Common Coordinate Frame , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Fernando De la Torre,et al.  Canonical Time Warping for Alignment of Human Behavior , 2009, NIPS.

[21]  N. Gordon,et al.  Novel approach to nonlinear/non-Gaussian Bayesian state estimation , 1993 .

[22]  Rama Chellappa,et al.  Aligning Spatio-Temporal Signals on a Special Manifold , 2010, ECCV.

[23]  Anuj Srivastava,et al.  Bayesian and geometric subspace tracking , 2004, Advances in Applied Probability.

[24]  Michal Irani,et al.  Aligning Sequences and Actions by Maximizing Space-Time Correlations , 2006, ECCV.

[25]  Patrick Pérez,et al.  Periodic motion detection and segmentation via approximate sequence alignment , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[26]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[27]  J. Ross Beveridge,et al.  Grassmann Registration Manifolds for Face Recognition , 2008, ECCV.

[28]  W. Rossmann Lie Groups: An Introduction through Linear Groups , 2002 .

[29]  Lior Wolf,et al.  Wide Baseline Matching between Unsynchronized Video Sequences , 2006, International Journal of Computer Vision.

[30]  Patrick Pérez,et al.  View-Independent Action Recognition from Temporal Self-Similarities , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Lior Wolf,et al.  Sequence-to-Sequence Self Calibration , 2002, ECCV.

[32]  Rama Chellappa,et al.  Group motion segmentation using a Spatio-Temporal Driving Force Model , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.