Variational Bayes for continuous hidden Markov models and its application to active learning

In this paper, we present a variational Bayes (VB) framework for learning continuous hidden Markov models (CHMMs), and we examine the VB framework within active learning. Unlike a maximum likelihood or maximum a posteriori training procedure, which yield a point estimate of the CHMM parameters, VB-based training yields an estimate of the full posterior of the model parameters. This is particularly important for small training sets since it gives a measure of confidence in the accuracy of the learned model. This is utilized within the context of active learning, for which we acquire labels for those feature vectors for which knowledge of the associated label would be most informative for reducing model-parameter uncertainty. Three active learning algorithms are considered in this paper: 1) query by committee (QBC), with the goal of selecting data for labeling that minimize the classification variance, 2) a maximum expected information gain method that seeks to label data with the goal of reducing the entropy of the model parameters, and 3) an error-reduction-based procedure that attempts to minimize classification error over the test data. The experimental results are presented for synthetic and measured data. We demonstrate that all of these active learning methods can significantly reduce the amount of required labeling, compared to random selection of samples for labeling.

[1]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[2]  Michael I. Jordan,et al.  Bayesian parameter estimation via variational methods , 2000, Stat. Comput..

[3]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[4]  James C. Spall,et al.  Estimation via Markov chain Monte Carlo , 2002, Proceedings of the 2002 American Control Conference (IEEE Cat. No.CH37301).

[5]  H. Sebastian Seung,et al.  Selective Sampling Using the Query by Committee Algorithm , 1997, Machine Learning.

[6]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[7]  Andrew McCallum,et al.  Employing EM and Pool-Based Active Learning for Text Classification , 1998, ICML.

[8]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[9]  H. Sebastian Seung,et al.  Query by committee , 1992, COLT '92.

[10]  Brendan J. Frey,et al.  Advances in Algorithms for Inference and Learning in Complex Probability Models , 2003 .

[11]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[12]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[13]  Shlomo Argamon,et al.  Committee-Based Sample Selection for Probabilistic Classifiers , 1999, J. Artif. Intell. Res..

[14]  Lawrence Carin,et al.  Hidden Markov models for multiaspect target classification , 1999, IEEE Trans. Signal Process..

[15]  Andrew McCallum,et al.  Toward Optimal Active Learning through Sampling Estimation of Error Reduction , 2001, ICML.

[16]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[17]  Thomas P. Minka,et al.  Using lower bounds to approxi-mate integrals , 2001 .

[18]  Hagai Attias,et al.  A Variational Bayesian Framework for Graphical Models , 1999 .

[19]  Christopher M. Bishop,et al.  Variational Relevance Vector Machines , 2000, UAI.

[20]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[21]  David J. C. MacKay,et al.  Information-Based Objective Functions for Active Data Selection , 1992, Neural Computation.

[22]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.