A recursive method for discriminative mixture learning

We consider the problem of learning density mixture models for classification. Traditional learning of mixtures for density estimation focuses on models that correctly represent the density at all points in the sample space. Discriminative learning, on the other hand, aims at representing the density at the decision boundary. We introduce a novel discriminative learning method for mixtures of generative models. Unlike traditional discriminative learning methods that often resort to computationally demanding gradient search optimization, the proposed method is highly efficient as it reduces to generative learning of individual mixture components on weighted data. Hence it is particularly suited to domains with complex component models, such as hidden Markov models or Bayesian networks in general, that are usually too complex for effective gradient search. We demonstrate the benefits of the proposed method in a comprehensive set of evaluations on time-series sequence classification problems.

[1]  Aaron F. Bobick,et al.  Performance Analysis of Time-Distance Gait Parameters under Different Speeds , 2003, AVBPA.

[2]  Franz Pernkopf,et al.  Discriminative versus generative parameter and structure learning of Bayesian network classifiers , 2005, ICML.

[3]  David Haussler,et al.  Using the Fisher Kernel Method to Detect Remote Protein Homologies , 1999, ISMB.

[4]  S. Sathiya Keerthi,et al.  Which Is the Best Multiclass SVM Method? An Empirical Study , 2005, Multiple Classifier Systems.

[5]  Bo Thiesson,et al.  Learning Mixtures of DAG Models , 1998, UAI.

[6]  Peter L. Bartlett,et al.  Functional Gradient Techniques for Combining Hypotheses , 2000 .

[7]  Bo Thiesson,et al.  Staged Mixture Modelling and Boosting , 2002, UAI.

[8]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[9]  Vladimir Pavlovic,et al.  Model-based motion clustering using boosted mixture modeling , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[10]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[11]  Bin Shen,et al.  Structural Extension to Logistic Regression: Discriminative Parameter Learning of Belief Net Classifiers , 2002, Machine Learning.

[12]  Lawrence K. Saul,et al.  Large Margin Hidden Markov Models for Automatic Speech Recognition , 2006, NIPS.

[13]  Vladimir Pavlovic,et al.  Efficient discriminative learning of Bayesian network classifier via boosted augmented naive Bayes , 2005, ICML '05.

[14]  Roger K. Moore Computer Speech and Language , 1986 .

[15]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[16]  Mitch Weintraub,et al.  Discriminative mixture weight estimation for large Gaussian mixture models , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[17]  Robert Tibshirani,et al.  Classification by Pairwise Coupling , 1997, NIPS.

[18]  Saharon Rosset,et al.  Boosting Density Estimation , 2002, NIPS.

[19]  Daniel Povey,et al.  Large scale discriminative training of hidden Markov models for speech recognition , 2002, Comput. Speech Lang..

[20]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.