论文信息 - Learning from Incomplete Data

Learning from Incomplete Data

Real-world learning tasks often involve high-dimensional data sets with complex patterns of missing features. In this paper we review the problem of learning from incomplete data from two statistical perspectives---the likelihood-based and the Bayesian. The goal is two-fold: to place current neural network approaches to missing data within a statistical framework, and to describe a set of algorithms, derived from the likelihood-based framework, that handle clustering, classification, and function approximation <from incomplete data in a principled and efficient manner. These algorithms are based on mixture modeling and make two distinct appeals to the Expectation-Maximization (EM) principle (Dempster, Laird, and Rubin 1977)---both for the estimation of mixture components and for coping with the missing data.

Michael I. Jordan | Zoubin Ghahramani | Zoubin Ghahramani

[1] N. Metropolis,et al. Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[2] W. K. Hastings,et al. Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[3] Donald B. Rubin,et al. Max-imum Likelihood from Incomplete Data , 1972 .

[4] G. C. Tiao,et al. Bayesian inference in statistical analysis , 1973 .

[5] Richard O. Duda,et al. Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[6] Peter E. Hart,et al. Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[7] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[8] Donald Geman,et al. Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9] R. Little,et al. Maximum likelihood estimation for mixed continuous and categorical data with missing values , 1985 .

[10] Geoffrey E. Hinton,et al. Learning and relearning in Boltzmann machines , 1986 .

[11] W. Wong,et al. The calculation of posterior distributions by data augmentation , 1987 .