Matching Seqlets: An Unsupervised Approach for Locality Preserving Sequence Matching

In this paper, we propose a novel unsupervised approach for sequence matching by explicitly accounting for the locality properties in the sequences. In contrast to conventional approaches that rely on frame-to-frame matching, we conduct matching using sequencelet or seqlet, a sub-sequence wherein the frames share strong similarities and are thus grouped together. The optimal seqlets and matching between them are learned jointly, without any supervision from users. The learned seqlets preserve the locality information at the scale of interest and resolve the ambiguities during matching, which are omitted by frame-based matching methods. We show that our proposed approach outperforms the state-of-the-art ones on datasets of different domains including human actions, facial expressions, speech, and character strokes.

[1]  Xiaoqing Ding,et al.  Linear Sequence Discriminant Analysis: A Model-Based Dimensionality Reduction Method for Vector Sequences , 2013, 2013 IEEE International Conference on Computer Vision.

[2]  Anuj Srivastava,et al.  Statistical analysis of trajectories on Riemannian manifolds: Bird migration, hurricane tracking and video surveillance , 2014, 1405.0803.

[3]  Fernando De la Torre,et al.  Generalized time warping for multi-modal alignment of human motion , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[5]  Geoffrey I. Webb,et al.  Dynamic Time Warping Averaging of Time Series Allows Faster and More Accurate Classification , 2014, 2014 IEEE International Conference on Data Mining.

[6]  Cordelia Schmid,et al.  Dense Trajectories and Motion Boundary Descriptors for Action Recognition , 2013, International Journal of Computer Vision.

[7]  Gang Hua,et al.  Order-Preserving Wasserstein Distance for Sequence Matching , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Yunde Jia,et al.  Parsing video events with goal inference and intent prediction , 2011, 2011 International Conference on Computer Vision.

[9]  Damien Garreau,et al.  Metric Learning for Temporal Sequence Alignment , 2014, NIPS.

[10]  Javid Taheri,et al.  SparseDTW: A Novel Approach to Speed up Dynamic Time Warping , 2009, AusDM.

[11]  Markus H. Gross,et al.  A Neural Multi-sequence Alignment TeCHnique (NeuMATCH) , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12]  Dimitrios Gunopulos,et al.  Indexing Multidimensional Time-Series , 2004, The VLDB Journal.

[13]  Ze-Nian Li,et al.  Matching by Linear Programming and Successive Convexification , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Eamonn J. Keogh,et al.  Making Time-Series Classification More Accurate Using Learned Constraints , 2004, SDM.

[15]  Ying Wu,et al.  Learning Maximum Margin Temporal Warping for Action Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[16]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[17]  Ying Wu,et al.  Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Cordelia Schmid,et al.  Actom sequence models for efficient action detection , 2011, CVPR 2011.

[19]  Gonzalo Navarro,et al.  A guided tour to approximate string matching , 2001, CSUR.

[20]  Georgios Evangelidis,et al.  Continuous Action Recognition Based on Sequence Alignment , 2014, International Journal of Computer Vision.

[21]  Philip Chan,et al.  Toward accurate dynamic time warping in linear time and space , 2007, Intell. Data Anal..

[22]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Xiaoqing Ding,et al.  Discriminative Dimensionality Reduction for Multi-Dimensional Sequences , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Pierre-François Marteau,et al.  Time Warp Edit Distance with Stiffness Adjustment for Time Series Matching , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Gershon Wolansky,et al.  Optimal Transport , 2021 .

[26]  C. Villani Optimal Transport: Old and New , 2008 .

[27]  Fei-Fei Li,et al.  Grouplet: A structured image representation for recognizing human and object interactions , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[28]  Fernando De la Torre,et al.  Canonical Time Warping for Alignment of Human Behavior , 2009, NIPS.

[29]  Thomas Philip Runarsson,et al.  Support vector machines and dynamic time warping for time series , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[30]  Takeo Kanade,et al.  The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[31]  Cordelia Schmid,et al.  Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.

[32]  Gustavo E. A. P. A. Batista,et al.  Speeding Up All-Pairwise Dynamic Time Warping Matrix Calculation , 2016, SDM.

[33]  Cordelia Schmid,et al.  P-CNN: Pose-Based CNN Features for Action Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[34]  Anuj Srivastava,et al.  Action Recognition Using Rate-Invariant Analysis of Skeletal Shape Trajectories , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[36]  Wanqing Li,et al.  Action recognition based on a bag of 3D points , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[37]  Jason J. Corso,et al.  Action bank: A high-level representation of activity in video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  William Brendel,et al.  Learning spatiotemporal graphs of human activities , 2011, 2011 International Conference on Computer Vision.

[39]  Changsong Liu,et al.  Discriminative Transformation for Multi-Dimensional Temporal Sequences , 2017, IEEE Transactions on Image Processing.

[40]  José A. Rodríguez-Serrano,et al.  A Model-Based Sequence Similarity with Application to Handwritten Word Spotting , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Jiahuan Zhou,et al.  Unsupervised Hierarchical Dynamic Parsing and Encoding for Action Recognition , 2017, IEEE Transactions on Image Processing.

[42]  Limin Wang,et al.  Action recognition with trajectory-pooled deep-convolutional descriptors , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  L. Bergroth,et al.  A survey of longest common subsequence algorithms , 2000, Proceedings Seventh International Symposium on String Processing and Information Retrieval. SPIRE 2000.

[44]  Yun Fu,et al.  Modeling Complex Temporal Composition of Actionlets for Activity Prediction , 2012, ECCV.

[45]  Ki-Sang Hong,et al.  Enhanced Sequence Matching for Action Recognition from 3D Skeletal Data , 2014, ACCV.

[46]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[47]  Rushil Anirudh,et al.  Geometry-Based Symbolic Approximation for Fast Sequence Matching on Manifolds , 2015, International Journal of Computer Vision.

[48]  Ze-Nian Li,et al.  Successive Convex Matching for Action Detection , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[49]  Benjamin Z. Yao,et al.  Unsupervised learning of event AND-OR grammar and semantics from video , 2011, 2011 International Conference on Computer Vision.