Some learning techniques for classification tasks work indirectly, by first trying to fit a full probabilistic model to the observed data. Whether this is a good idea or not depends on the robustness with respect to deviations from the postulated model. We study this question experimentally in a restricted, yet non-trivial and interesting case: we consider a conditionally independent attribute (CIA) model which postulates a single binary-valued hidden variable z on which all other attributes (i.e., the target and the observables) depend. In this model, finding the most likely value of anyone variable (given known values for the others) reduces to testing a linear function of the observed values.
We learn CIA with two techniques: the standard EM algorithm, and a new algorithm we develop based on covariances. We compare these, in a controlled fashion, against an algorithm (a version of Winnow) that attempts to find a good linear classifier directly. Our conclusions help delimit the fragility of using the CIA model for classification: once the data departs from this model, performance quickly degrades and drops below that of the directly-learned linear classifier.
[1]
Nick Littlestone,et al.
Redundant noisy attributes, attribute errors, and linear-threshold learning using winnow
,
1991,
COLT '91.
[2]
N. Littlestone.
Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm
,
1987,
28th Annual Symposium on Foundations of Computer Science (sfcs 1987).
[3]
Avrim Blum,et al.
Empirical Support for Winnow and Weighted-Majority Based Algorithms: Results on a Calendar Scheduling Domain
,
1995,
ICML.
[4]
D. Rubin,et al.
Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper
,
1977
.
[5]
Dan Roth,et al.
Applying Winnow to Context-Sensitive Spelling Correction
,
1996,
ICML.
[6]
Hans Ulrich Simon,et al.
Robust Trainability of Single Neurons
,
1995,
J. Comput. Syst. Sci..