Sharp, reliable predictions using supervised mixture models

This doctoral dissertation develops a new way to make probabilistic predictions from a database of examples. The method looks for regions in the data where different predictions are appropriate, and it naturally extends clustering algorithms that have been used with great success in exploratory data analysis. In probabilistic terms, the new method looks at the same models as before, but it only evaluates them for the conditional probability they assign to a single feature rather than the joint probability they assign to all features. A good models is therefore forced to classify the data in a way that is useful for a single, desired prediction, rather than just identifying the strongest overall pattern in the data. The results of this dissertation extend the clean, Bayesian approach of the unsupervised AutoClass system to the supervised learning problems common in everyday practice. Highlights include: (1) clear probabilistic semantics; (2) prediction and use of discrete, categorical, and continuous data; (3) priors that avoid the overfitting problem; (4) an explicit noise model to identify unreliable predictions; (5) the ability to handle missing data. A computer implementation, MultiClass, validates the ideas with performance that exceeds neural nets, decision trees, and other current supervised machine learning systems. The dissertation is written for a general audience with many, many examples to motivate the new ideas. The scope of potential applications is very large, including problems like evaluating student admissions applications, assessing credit risk, and identifying customers likely to order from Tiffany's latest Christmas gift catalog.