论文信息 - Linear Concepts and Hidden Variables

Linear Concepts and Hidden Variables

We study a learning problem which allows for a “fair” comparison between unsupervised learning methods—probabilistic model construction, and more traditional algorithms that directly learn a classification. The merits of each approach are intuitively clear: inducing a model is more expensive computationally, but may support a wider range of predictions. Its performance, however, will depend on how well the postulated probabilistic model fits that data. To compare the paradigms we consider a model which postulates a single binary-valued hidden variable on which all other attributes depend. In this model, finding the most likely value of any one variable (given known values for the others) reduces to testing a linear function of the observed values. We learn the model with two techniques: the standard EM algorithm, and a new algorithm we develop based on covariances. We compare these, in a controlled fashion, against an algorithm (a version of Winnow) that attempts to find a good linear classifier directly. Our conclusions help delimit the fragility of using a model that is even “slightly” simpler than the distribution actually generating the data, vs. the relative robustness of directly searching for a good predictor.

Dan Roth | Adam J. Grove

[1] Hans Ulrich Simon,et al. Robust Trainability of Single Neurons , 1995, J. Comput. Syst. Sci..

[2] Corinna Cortes,et al. Support-Vector Networks , 1995, Machine Learning.

[3] Robert E. Schapire,et al. Efficient Distribution-Free Learning of Probabilistic , 1994 .

[4] Avrim Blum,et al. Empirical Support for Winnow and Weighted-Majority Algorithms: Results on a Calendar Scheduling Domain , 2004, Machine Learning.

[5] Paul W. Goldberg,et al. Evolutionary Trees Can be Learned in Polynomial Time in the Two-State General Markov Model , 2001, SIAM J. Comput..

[6] N. Littlestone. Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[7] Robert V. Hogg,et al. Introduction to Mathematical Statistics. , 1966 .

[8] Avrim Blum,et al. Empirical Support for Winnow and Weighted-Majority Based Algorithms: Results on a Calendar Scheduling Domain , 1995, ICML.

[9] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[10] Paul W. Goldberg,et al. Evolutionary trees can be learned in polynomial time in the two-state general Markov model , 1998, Proceedings 39th Annual Symposium on Foundations of Computer Science (Cat. No.98CB36280).

[11] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[12] Dan Roth,et al. A Winnow-Based Approach to Context-Sensitive Spelling Correction , 1998, Machine Learning.

[13] Yishay Mansour,et al. Estimating a mixture of two product distributions , 1999, COLT '99.

[14] Leo Breiman,et al. Bagging Predictors , 1996, Machine Learning.

[15] Nick Littlestone,et al. Redundant noisy attributes, attribute errors, and linear-threshold learning using winnow , 1991, COLT '91.

[16] Leslie G. Valiant,et al. A theory of the learnable , 1984, CACM.