Discriminative Learning of Mixture of Bayesian Network Classifiers for Sequence Classification

A mixture of Bayesian Network Classifiers(BNC) has a potential to yield superior classification and generative performance to a single BNC model. We introduce novel discriminative learning methods for mixtures of BNCs. Unlike a single BNC model where the discriminative learning resorts to a gradient search, we can exploit the properties of a mixture to alleviate the complex learning task. The proposed method adds mixture components recursively via functional gradient boosting while maximizing the conditional likelihood. This method is highly efficient as it reduces to generative learning of a base BNC model on weighed data. The proposed approach is particularly suited to sequence classification problems where the kernels in the base model are usually too complex for effective gradient search. We demonstrate the improved classification performance of the proposed methods in an extensive set of evaluations on time-series sequence data, including human motion classification problems.

[1]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[2]  David Haussler,et al.  Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[3]  M. Friedman Greedy Fun tion Approximation : A Gradient Boosting , 1999 .

[4]  Daniel Povey,et al.  Large scale discriminative training of hidden Markov models for speech recognition , 2002, Comput. Speech Lang..

[5]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[6]  V. Pavlovic Model-based motion clustering using boosted mixture modeling , 2004, CVPR 2004.

[7]  Daniel Povey,et al.  Large scale discriminative training for speech recognition , 2000 .

[8]  Eamonn J. Keogh,et al.  Making Time-Series Classification More Accurate Using Learned Constraints , 2004, SDM.

[9]  Rawesak Tanawongsuwan,et al.  Characteristics of Time-Distance Gait Parameters Across Speeds , 2003 .

[10]  Peter L. Bartlett,et al.  Functional Gradient Techniques for Combining Hypotheses , 2000 .

[11]  Qiang He,et al.  Individual recognition from periodic activity using hidden Markov models , 2000, Proceedings Workshop on Human Motion.

[12]  Vladimir Pavlovic,et al.  Discovering clusters in motion time-series data , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[13]  L. R. Rabiner,et al.  A probabilistic distance measure for hidden Markov models , 1985, AT&T Technical Journal.

[14]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[15]  Eamonn J. Keogh,et al.  UCR Time Series Data Mining Archive , 1983 .

[16]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[17]  Aaron F. Bobick,et al.  Performance Analysis of Time-Distance Gait Parameters under Different Speeds , 2003, AVBPA.

[18]  R. Stephenson A and V , 1962, The British journal of ophthalmology.

[19]  Aaron F. Bobick,et al.  Parametric Hidden Markov Models for Gesture Recognition , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Saharon Rosset,et al.  Boosting Density Estimation , 2002, NIPS.

[21]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[22]  Nuria Oliver,et al.  MMIHMM: Maximum Mutual Information Hidden Markov Models , 2002, ICML.

[23]  Vladimir Pavlovic,et al.  Efficient discriminative learning of Bayesian network classifier via boosted augmented naive Bayes , 2005, ICML '05.

[24]  Bo Thiesson,et al.  Learning Mixtures of DAG Models , 1998, UAI.

[25]  Alex Pentland,et al.  Real-time American Sign Language recognition from video using hidden Markov models , 1995 .

[26]  Bo Thiesson,et al.  Staged Mixture Modelling and Boosting , 2002, UAI.