Discriminative Transformation for Multi-Dimensional Temporal Sequences

Feature space transformation techniques have been widely studied for dimensionality reduction in vector-based feature space. However, these techniques are inapplicable to sequence data because the features in the same sequence are not independent. In this paper, we propose a method called max–min inter-sequence distance analysis (MMSDA) to transform features in sequences into a low-dimensional subspace such that different sequence classes are holistically separated. To utilize the temporal dependencies, MMSDA first aligns features in sequences from the same class to an adapted number of temporal states, and then, constructs the sequence class separability based on the statistics of these ordered states. To learn the transformation, MMSDA formulates the objective of maximizing the minimal pairwise separability in the latent subspace as a semi-definite programming problem and provides a new tractable and effective solution with theoretical proofs by constraints unfolding and pruning, convex relaxation, and within-class scatter compression. Extensive experiments on different tasks have demonstrated the effectiveness of MMSDA.

[1]  Xiaoou Tang,et al.  Tensor linear Laplacian discrimination (TLLD) for feature extraction , 2009, Pattern Recognit..

[2]  Xinbo Gao,et al.  Stable Orthogonal Local Discriminant Embedding for Linear Dimensionality Reduction , 2013, IEEE Transactions on Image Processing.

[3]  Aleix M. Martínez,et al.  Bayes Optimality in Linear Discriminant Analysis , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[5]  M. Inés Torres,et al.  Comparative Study of the Baum-Welch and Viterbi Training Algorithms Applied to Read and Spontaneous Speech Recognition , 2003, IbPRIA.

[6]  Geoffrey E. Hinton,et al.  Neighbourhood Components Analysis , 2004, NIPS.

[7]  Hwann-Tzong Chen,et al.  Local discriminant embedding and its variants , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[8]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[9]  Michael L. Overton,et al.  On the Sum of the Largest Eigenvalues of a Symmetric Matrix , 1992, SIAM J. Matrix Anal. Appl..

[10]  Haixian Wang,et al.  L1-Norm Kernel Discriminant Analysis Via Bayes Error Bound Optimization for Robust Feature Extraction , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[11]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[12]  Robert P. W. Duin,et al.  Multiclass Linear Dimension Reduction by Weighted Pairwise Fisher Criteria , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Robert P. W. Duin,et al.  Linear dimensionality reduction via a heteroscedastic extension of LDA: the Chernoff criterion , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Cordelia Schmid,et al.  Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.

[15]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[16]  Rohit Prasad,et al.  OCR-Driven Writer Identification and Adaptation in an HMM Handwriting Recognition System , 2011, 2011 International Conference on Document Analysis and Recognition.

[17]  Damien Garreau,et al.  Metric Learning for Temporal Sequence Alignment , 2014, NIPS.

[18]  Yaoliang Yu,et al.  Distance metric learning by minimal distance maximization , 2011, Pattern Recognit..

[19]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[20]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[21]  Mohammed Waleed Kadous,et al.  Temporal classification: extending the classification paradigm to multivariate time series , 2002 .

[22]  Steve Young,et al.  The HTK book , 1995 .

[23]  Dit-Yan Yeung,et al.  Worst-Case Linear Discriminant Analysis , 2010, NIPS.

[24]  Eric O. Postma,et al.  Dimensionality Reduction: A Comparative Review , 2008 .

[25]  Xiaoqing Ding,et al.  Discriminative Dimensionality Reduction for Multi-Dimensional Sequences , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Bo Xu,et al.  Maxi-Min discriminant analysis via online learning , 2012, Neural Networks.

[27]  Ying Wu,et al.  Heteroscedastic max-min distance analysis , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Stephen Lin,et al.  Graph Embedding and Extensions: A General Framework for Dimensionality Reduction , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[30]  Nitish Srivastava,et al.  Unsupervised Learning of Video Representations using LSTMs , 2015, ICML.

[31]  Kim-Chuan Toh,et al.  SDPT3 -- A Matlab Software Package for Semidefinite Programming , 1996 .

[32]  Jiashu Zhang,et al.  Linear Discriminant Analysis Based on L1-Norm Maximization , 2013, IEEE Transactions on Image Processing.

[33]  Trevor Darrell,et al.  Rank priors for continuous non-linear dimensionality reduction , 2009, CVPR.

[34]  Adrià Giménez,et al.  Arabic Handwriting Recognition Using Bernoulli HMMs , 2012 .

[35]  Aleix M. Martínez,et al.  Kernel Optimization in Discriminant Analysis , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Liangrui Peng,et al.  A Novel Baseline-independent Feature Set for Arabic Handwriting Recognition , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[37]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[38]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[39]  Thomas Serre,et al.  HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.

[40]  Kim-Chuan Toh,et al.  Solving semidefinite-quadratic-linear programs using SDPT3 , 2003, Math. Program..

[41]  Aleix M. Martínez,et al.  Subclass discriminant analysis , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  H. Ney,et al.  Linear discriminant analysis for improved large vocabulary continuous speech recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[43]  Dacheng Tao,et al.  Max-Min Distance Analysis by Using Sequential SDP Relaxation for Dimension Reduction , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Fernando De la Torre,et al.  Generalized time warping for multi-modal alignment of human motion , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[45]  Trevor Darrell,et al.  Rank priors for continuous non-linear dimensionality reduction , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[46]  Cordelia Schmid,et al.  Actions in context , 2009, CVPR.

[47]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[48]  Xiaoqing Ding,et al.  Linear Sequence Discriminant Analysis: A Model-Based Dimensionality Reduction Method for Vector Sequences , 2013, 2013 IEEE International Conference on Computer Vision.

[49]  Cheng-Lin Liu,et al.  Evaluation of weighted Fisher criteria for large category dimensionality reduction in application to Chinese handwriting recognition , 2013, Pattern Recognit..

[50]  Shiguang Shan,et al.  Multi-View Discriminant Analysis , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51]  Fernando De la Torre,et al.  Canonical Time Warping for Alignment of Human Behavior , 2009, NIPS.