论文信息 - Advanced Lectures on Machine Learning

Advanced Lectures on Machine Learning

This chapter describes Lagrange multipliers and some selected subtopics from matrix analysis from a machine learning perspective. The goal is to give a detailed description of a number of mathematical constructions that are widely used in applied machine learning.

[1] Robert G. Gallager,et al. Low-density parity-check codes , 1962, IRE Trans. Inf. Theory.

[2] R. Shepard. The analysis of proximities: Multidimensional scaling with an unknown distance function. I. , 1962 .

[3] J. Kruskal. Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis , 1964 .

[4] D. Mayne,et al. Monte Carlo techniques to estimate the conditional expectation in multi-stage non-linear filtering† , 1969 .

[5] C. Antoniak. Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems , 1974 .

[6] J. Besag. Spatial Interaction and the Statistical Analysis of Lattice Systems , 1974 .

[7] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[8] R. Shumway,et al. AN APPROACH TO TIME SERIES SMOOTHING AND FORECASTING USING THE EM ALGORITHM , 1982 .

[9] T. Ferguson. BAYESIAN DENSITY ESTIMATION BY MIXTURES OF NORMAL DISTRIBUTIONS , 1983 .

[10] Geoffrey E. Hinton,et al. A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[11] Derek G. Corneil,et al. Complexity of finding embeddings in a k -tree , 1987 .

[12] David J. Spiegelhalter,et al. Local computations with probabilities on graphical structures and their application to expert systems , 1990 .

[13] Judea Pearl,et al. Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[14] Uue Kjjrull. Triangulation of Graphs { Algorithms Giving Small Total State Space Triangulation of Graphs { Algorithms Giving Small Total State Space , 1990 .

[15] Gregory F. Cooper,et al. The Computational Complexity of Probabilistic Inference Using Bayesian Belief Networks , 1990, Artif. Intell..

[16] James O. Berger,et al. Ockham's Razor and Bayesian Analysis , 1992 .

[17] Radford M. Neal. Connectionist Learning of Belief Networks , 1992, Artif. Intell..

[18] A. Glavieux,et al. Near Shannon limit error-correcting coding and decoding: Turbo-codes. 1 , 1993, Proceedings of ICC '93 - IEEE International Conference on Communications.

[19] N. Gordon,et al. Novel approach to nonlinear/non-Gaussian Bayesian state estimation , 1993 .

[20] Stuart J. Russell,et al. Stochastic simulation algorithms for dynamic probabilistic networks , 1995, UAI.

[21] David Mackay,et al. Probable networks and plausible predictions - a review of practical Bayesian methods for supervised neural networks , 1995 .

[22] Geoffrey E. Hinton,et al. The "wake-sleep" algorithm for unsupervised neural networks. , 1995, Science.

[23] Petros G. Voulgaris,et al. On optimal ℓ∞ to ℓ∞ filtering , 1995, Autom..

[24] M. Escobar,et al. Bayesian Density Estimation and Inference Using Mixtures , 1995 .

[25] Tomaso A. Poggio,et al. Regularization Theory and Neural Networks Architectures , 1995, Neural Computation.

[26] Geoffrey E. Hinton,et al. The EM algorithm for mixtures of factor analyzers , 1996 .

[27] G. Kitagawa. Monte Carlo Filter and Smoother for Non-Gaussian Nonlinear State Space Models , 1996 .

[28] Geoffrey E. Hinton,et al. Modeling the manifolds of images of handwritten digits , 1997, IEEE Trans. Neural Networks.

[29] Jeffrey K. Uhlmann,et al. New extension of the Kalman filter to nonlinear systems , 1997, Defense, Security, and Sensing.

[30] D.J.C. MacKay,et al. Good error-correcting codes based on very sparse matrices , 1997, Proceedings of IEEE International Symposium on Information Theory.

[31] Sam T. Roweis,et al. EM Algorithms for PCA and SPCA , 1997, NIPS.

[32] David Heckerman,et al. A Tutorial on Learning with Bayesian Networks , 1998, Learning in Graphical Models.

[33] Christopher M. Bishop,et al. GTM: The Generative Topographic Mapping , 1998, Neural Computation.

[34] Jung-Fu Cheng,et al. Turbo Decoding as an Instance of Pearl's "Belief Propagation" Algorithm , 1998, IEEE J. Sel. Areas Commun..

[35] Nir Friedman,et al. The Bayesian Structural EM Algorithm , 1998, UAI.

[36] Zoubin Ghahramani,et al. Learning Nonlinear Dynamical Systems Using an EM Algorithm , 1998, NIPS.

[37] Geoffrey E. Hinton,et al. A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[38] Christopher M. Bishop,et al. Mixtures of Probabilistic Principal Component Analyzers , 1999, Neural Computation.

[39] Zoubin Ghahramani,et al. A Unifying Review of Linear Gaussian Models , 1999, Neural Computation.

[40] Carl E. Rasmussen,et al. The Infinite Gaussian Mixture Model , 1999, NIPS.

[41] Rudolph van der Merwe,et al. Dual Estimation and the Unscented Transformation , 1999, NIPS.

[42] Hagai Attias,et al. Inferring Parameters and Structure of Latent Variable Models by Variational Bayes , 1999, UAI.

[43] Michael E. Tipping,et al. Probabilistic Principal Component Analysis , 1999 .

[44] Yair Weiss,et al. Correctness of Local Probability Propagation in Graphical Models with Loops , 2000, Neural Computation.

[45] J. Tenenbaum,et al. A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[46] Carl E. Rasmussen,et al. Occam's Razor , 2000, NIPS.

[47] Nir Friedman,et al. Discovering Hidden Variables: A Structure-Based Approach , 2000, NIPS.

[48] S T Roweis,et al. Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[49] M. Escobar,et al. Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[50] W. Freeman,et al. Generalized Belief Propagation , 2000, NIPS.

[51] David Maxwell Chickering,et al. A comparison of scientific and engineering criteria for Bayesian model selection , 2000, Stat. Comput..

[52] Zoubin Ghahramani,et al. Propagation Algorithms for Variational Bayesian Learning , 2000, NIPS.

[53] Tom Minka,et al. Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[54] William T. Freeman,et al. On the optimality of solutions of the max-product belief-propagation algorithm in arbitrary graphs , 2001, IEEE Trans. Inf. Theory.

[55] M. Seeger. Learning with labeled and unlabeled dataMatthias , 2001 .

[56] Carl E. Rasmussen,et al. Factorial Hidden Markov Models , 1997 .

[57] Tommi S. Jaakkola,et al. Partially labeled classification with Markov random walks , 2001, NIPS.

[58] Tom Minka,et al. A family of algorithms for approximate Bayesian inference , 2001 .

[59] Thomas P. Minka,et al. The EP energy function and minimization schemes , 2001 .

[60] Stuart J. Russell,et al. Dynamic bayesian networks: representation, inference and learning , 2002 .

[61] Zoubin Ghahramani,et al. Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[62] Thomas L. Griffiths,et al. Semi-Supervised Learning with Trees , 2003, NIPS.

[63] Andrew W. Moore,et al. Optimal Reinsertion: A New Search Operator for Accelerated and More Accurate Bayesian Network Structure Learning , 2003, ICML.

[64] Radford M. Neal,et al. Inferring State Sequences for Non-linear Systems with Embedded Hidden Markov Models , 2003, NIPS.

[65] Zoubin Ghahramani,et al. Optimization with EM and Expectation-Conjugate-Gradient , 2003, ICML.

[66] Timothy J. Robinson,et al. Sequential Monte Carlo Methods in Practice , 2003 .

[67] Radford M. Neal,et al. Density Modeling and Clustering Using Dirichlet Diffusion Trees , 2003 .

[68] Matthew J. Beal,et al. The variational Bayesian EM algorithm for incomplete data: with application to scoring graphical model structures , 2003 .

[69] M. Lewicki,et al. Learning higher-order structures in natural images , 2003, Network.

[70] John M. Winn,et al. Variational Message Passing and its Applications , 2004 .

[71] Michael I. Jordan,et al. Variational methods for the Dirichlet process , 2004, ICML.

[72] Michael I. Jordan,et al. An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[73] Michael I. Jordan,et al. Factorial Hidden Markov Models , 1995, Machine Learning.

[74] Michael Isard,et al. CONDENSATION—Conditional Density Propagation for Visual Tracking , 1998, International Journal of Computer Vision.

[75] R. Baierlein. Probability Theory: The Logic of Science , 2004 .

[76] Nir Friedman,et al. Being Bayesian About Network Structure. A Bayesian Approach to Structure Discovery in Bayesian Networks , 2004, Machine Learning.

[77] Mikhail Belkin,et al. Semi-Supervised Learning on Riemannian Manifolds , 2004, Machine Learning.

[78] Zoubin Ghahramani,et al. Bayesian Learning in Undirected Graphical Models: Approximate MCMC Algorithms , 2004, UAI.

[79] David J. C. MacKay,et al. Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.