A New Framework for Machine Learning

The last five years have seen the emergence of a powerful new framework for building sophisticated real-world applications based on machine learning. The cornerstones of this approach are (i) the adoption of a Bayesian viewpoint, (ii) the use of graphical models to represent complex probability distributions, and (iii) the development of fast, deterministic inference algorithms, such as variational Bayes and expectation propagation, which provide efficient solutions to inference and learning problems in terms of local message passing algorithms. This paper reviews the key ideas behind this new framework, and highlights some of its major benefits. The framework is illustrated using an example large-scale application.

[1]  David J. Spiegelhalter,et al.  VIBES: A Variational Inference Engine for Bayesian Networks , 2002, NIPS.

[2]  Brendan J. Frey,et al.  A Revolution: Belief Propagation in Graphs with Cycles , 1997, NIPS.

[3]  R. Baierlein Probability Theory: The Logic of Science , 2004 .

[4]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[5]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[6]  C. Striebel,et al.  On the maximum likelihood estimates for linear dynamic systems , 1965 .

[7]  Brendan J. Frey,et al.  Graphical Models for Machine Learning and Digital Communication , 1998 .

[8]  L. Baum,et al.  An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[9]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[10]  A. Glavieux,et al.  Near Shannon limit error-correcting coding and decoding: Turbo-codes. 1 , 1993, Proceedings of ICC '93 - IEEE International Conference on Communications.

[11]  M. Glickman Parameter Estimation in Large Dynamic Paired Comparison Experiments , 1999 .

[12]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[13]  Michael Horstein,et al.  Review of 'Low-Density Parity-Check Codes' (Gallager, R. G.; 1963) , 1964, IEEE Transactions on Information Theory.

[14]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[15]  Tom Minka,et al.  TrueSkill Through Time: Revisiting the History of Chess , 2007, NIPS.

[16]  Tom Minka,et al.  TrueSkillTM: A Bayesian Skill Rating System , 2006, NIPS.

[17]  Robert G. Gallager,et al.  Low-density parity-check codes , 1962, IRE Trans. Inf. Theory.

[18]  Tom Minka,et al.  Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[19]  D.J.C. MacKay,et al.  Good error-correcting codes based on very sparse matrices , 1997, Proceedings of IEEE International Symposium on Information Theory.

[20]  Jung-Fu Cheng,et al.  Turbo Decoding as an Instance of Pearl's "Belief Propagation" Algorithm , 1998, IEEE J. Sel. Areas Commun..

[21]  Paul Zarchan,et al.  Fundamentals of Kalman Filtering: A Practical Approach , 2001 .

[22]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[23]  Thomas Hofmann,et al.  TrueSkill™: A Bayesian Skill Rating System , 2007 .