We address statistical classifier design given a mixed training set consisting of a small labelled feature set and a (generally larger) set of unlabelled features. This situation arises, e.g., for medical images, where although training features may be plentiful, expensive expertise is required to extract their class labels. We propose a classifier structure and learning algorithm that make effective use of unlabelled data to improve performance. The learning is based on maximization of the total data likelihood, i.e. over both the labelled and unlabelled data subsets. Two distinct EM learning algorithms are proposed, differing in the EM formalism applied for unlabelled data. The classifier, based on a joint probability model for features and labels, is a "mixture of experts" structure that is equivalent to the radial basis function (RBF) classifier, but unlike RBFs, is amenable to likelihood-based training. The scope of application for the new method is greatly extended by the observation that test data, or any new data to classify, is in fact additional, unlabelled data - thus, a combined learning/classification operation - much akin to what is done in image segmentation - can be invoked whenever there is new data to classify. Experiments with data sets from the UC Irvine database demonstrate that the new learning algorithms and structure achieve substantial performance gains over alternative approaches.
[1]
D. Rubin,et al.
Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper
,
1977
.
[2]
John Moody,et al.
Fast Learning in Networks of Locally-Tuned Processing Units
,
1989,
Neural Computation.
[3]
R. Lippmann.
Pattern classification using neural networks
,
1989,
IEEE Communications Magazine.
[4]
Michael I. Jordan,et al.
Supervised learning from incomplete data via an EM approach
,
1993,
NIPS.
[5]
Robert A. Jacobs,et al.
Hierarchical Mixtures of Experts and the EM Algorithm
,
1993,
Neural Computation.
[6]
David A. Landgrebe,et al.
The effect of unlabeled samples in reducing the small sample size problem and mitigating the Hughes phenomenon
,
1994,
IEEE Trans. Geosci. Remote. Sens..
[7]
Geoffrey E. Hinton,et al.
An Alternative Model for Mixtures of Experts
,
1994,
NIPS.
[8]
Volker Tresp,et al.
Efficient Methods for Dealing with Missing Data in Supervised Learning
,
1994,
NIPS.
[9]
Vittorio Castelli,et al.
On the exponential value of labeled samples
,
1995,
Pattern Recognit. Lett..
[10]
Kenneth Rose,et al.
A global optimization technique for statistical classifier design
,
1996,
IEEE Trans. Signal Process..