论文信息 - Sequence classification via large margin hidden Markov models

Sequence classification via large margin hidden Markov models

We address the sequence classification problem using a probabilistic model based on hidden Markov models (HMMs). In contrast to commonly-used likelihood-based learning methods such as the joint/conditional maximum likelihood estimator, we introduce a discriminative learning algorithm that focuses on class margin maximization. Our approach has two main advantages: (i) As an extension of support vector machines (SVMs) to sequential, non-Euclidean data, the approach inherits benefits of margin-based classifiers, such as the provable generalization error bounds. (ii) Unlike many algorithms based on non-parametric estimation of similarity measures that enforce weak constraints on the data domain, our approach utilizes the HMM’s latent Markov structure to regularize the model in the high-dimensional sequence space. We demonstrate significant improvements in classification performance of the proposed method in an extensive set of evaluations on time-series sequence data that frequently appear in data mining and computer vision domains.

Vladimir Pavlovic | Minyoung Kim | Minyoung Kim | V. Pavlovic

[1] Hui Jiang,et al. Large margin hidden Markov models for speech recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[2] Vladimir Pavlovic,et al. Discovering clusters in motion time-series data , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[3] L. R. Rabiner,et al. A probabilistic distance measure for hidden Markov models , 1985, AT&T Technical Journal.

[4] Hui Jiang,et al. Discriminative training of CDHMMs for maximum relative separation margin , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[5] Rui Li,et al. Articulated Pose Estimation in a Learned Smooth Space of Feasible Solutions , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[6] S. Sathiya Keerthi,et al. Which Is the Best Multiclass SVM Method? An Empirical Study , 2005, Multiple Classifier Systems.

[7] Anders Krogh. Hidden Markov models for labeled sequences , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[8] Georg Heigold,et al. On the equivalence of Gaussian HMM and Gaussian HMM-like hidden conditional random fields , 2007, INTERSPEECH.

[9] Eleazar Eskin,et al. The Spectrum Kernel: A String Kernel for SVM Protein Classification , 2001, Pacific Symposium on Biocomputing.

[10] Sean R. Eddy,et al. Biological sequence analysis: Preface , 1998 .

[11] Michael I. Jordan,et al. On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[12] John Shawe-Taylor,et al. A framework for structural risk minimisation , 1996, COLT '96.

[13] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[14] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[15] Dimitri P. Bertsekas,et al. Nonlinear Programming , 1997 .

[16] A. Nadas,et al. A decision theorectic formulation of a training problem in speech recognition and a comparison of training by unconditional versus conditional maximum likelihood , 1983 .

[17] Samy Bengio,et al. Discriminative kernel-based phoneme sequence recognition , 2006, INTERSPEECH.

[18] Aaron F. Bobick,et al. Parametric Hidden Markov Models for Gesture Recognition , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[19] Alex Pentland,et al. Real-time American Sign Language recognition from video using hidden Markov models , 1995 .

[20] Bin Shen,et al. Structural Extension to Logistic Regression: Discriminative Parameter Learning of Belief Net Classifiers , 2002, Machine Learning.

[21] Jinyu Li,et al. Soft margin estimation of hidden Markov model parameters , 2006, INTERSPEECH.

[22] Thomas Hofmann,et al. Hidden Markov Support Vector Machines , 2003, ICML.

[23] Eamonn J. Keogh,et al. Three Myths about Dynamic Time Warping Data Mining , 2005, SDM.

[24] Fernando Pereira,et al. Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[25] Eamonn J. Keogh,et al. UCR Time Series Data Mining Archive , 1983 .

[26] Ben Taskar,et al. Max-Margin Markov Networks , 2003, NIPS.

[27] Koby Crammer,et al. On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[28] Ashok Veeraraghavan,et al. The Function Space of an Activity , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[29] Peter L. Bartlett,et al. The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.

[30] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[31] Daniel Povey,et al. Large scale discriminative training of hidden Markov models for speech recognition , 2002, Comput. Speech Lang..

[32] S. Chiba,et al. Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[33] Lawrence K. Saul,et al. Large Margin Hidden Markov Models for Automatic Speech Recognition , 2006, NIPS.

[34] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[35] Bernhard E. Boser,et al. A training algorithm for optimal margin classifiers , 1992, COLT '92.

[36] Aaron F. Bobick,et al. Performance Analysis of Time-Distance Gait Parameters under Different Speeds , 2003, AVBPA.

[37] David Haussler,et al. Using the Fisher Kernel Method to Detect Remote Protein Homologies , 1999, ISMB.

[38] Robert Tibshirani,et al. Classification by Pairwise Coupling , 1997, NIPS.

[39] Franz Pernkopf,et al. Discriminative versus generative parameter and structure learning of Bayesian network classifiers , 2005, ICML.

[40] Eamonn J. Keogh,et al. Making Time-Series Classification More Accurate Using Learned Constraints , 2004, SDM.

[41] Tong Zhang,et al. Covering Number Bounds of Certain Regularized Linear Function Classes , 2002, J. Mach. Learn. Res..

[42] Ben Taskar,et al. A Discriminative Matching Approach to Word Alignment , 2005, HLT.

[43] Michael Collins,et al. Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[44] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[45] Trevor Darrell,et al. Hidden Conditional Random Fields , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.