Learning from labeled and unlabeled data using a minimal number of queries

The considerable time and expense required for labeling data has prompted the development of algorithms which maximize the classification accuracy for a given amount of labeling effort. On the one hand, the effort has been to develop the so-called "active learning" algorithms which sequentially choose the patterns to be explicitly labeled so as to realize the maximum information gain from each labeling. On the other hand, the effort has been to develop algorithms that can learn from labeled as well as the more abundant unlabeled data. Proposed in this paper is an algorithm that integrates the benefits of active learning with the benefits of learning from labeled and unlabeled data. Our approach is based on reversing the roles of the labeled and unlabeled data. Specifically, we use a Genetic Algorithm (GA) to iteratively refine the class membership of the unlabeled patterns so that the maximum a posteriori (MAP) based predicted labels of the patterns in the labeled dataset are in agreement with the known labels. This reversal of the role of labeled and unlabeled patterns leads to an implicit class assignment of the unlabeled patterns. For active learning, we use a subset of the GA population to construct multiple MAP classifiers. Points in the input space where there is maximal disagreement amongst these classifiers are then selected for explicit labeling. The learning from labeled and unlabeled data and active learning phases are interlaced and together provide accurate classification while minimizing the labeling effort.

[1]  G. McLachlan,et al.  Small sample results for a linear discriminant function estimated from a mixture of normal populations , 1979 .

[2]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[3]  R. Kothari,et al.  Learning from labeled and unlabeled data , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[4]  Ludmila I. Kuncheva,et al.  A Theoretical Study on Six Classifier Fusion Strategies , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Heinz Mühlenbein,et al.  On the Mean Convergence Time of Evolutionary Algorithms without Selection and Mutation , 1994, PPSN.

[6]  W. J. Studden,et al.  Theory Of Optimal Experiments , 1972 .

[7]  Vittorio Castelli,et al.  On the exponential value of labeled samples , 1995, Pattern Recognit. Lett..

[8]  H. Sebastian Seung,et al.  Query by committee , 1992, COLT '92.

[9]  Qiong Jackson,et al.  An adaptive classifier design for high-dimensional data analysis with a limited training data set , 2001, IEEE Trans. Geosci. Remote. Sens..

[10]  David A. Landgrebe,et al.  The effect of unlabeled samples in reducing the small sample size problem and mitigating the Hughes phenomenon , 1994, IEEE Trans. Geosci. Remote. Sens..

[11]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[12]  Terrence J. Sejnowski,et al.  Analysis of hidden units in a layered network trained to classify sonar targets , 1988, Neural Networks.

[13]  Chuanyi Ji,et al.  Combinations of Weak Classifiers , 1996, NIPS.

[14]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[15]  Amir F. Atiya,et al.  Neural Networks for Density Estimation , 1998, NIPS.

[16]  Mark Plutowski,et al.  Selecting concise training sets from clean data , 1993, IEEE Trans. Neural Networks.

[17]  Manfred Opper,et al.  Selection of examples for a linear classifier , 1996 .

[18]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[19]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[20]  Axel Roebel The Dynamic Pattern Selection Algorithm: Effective Training and Controlled Generalization of Backpropagation Neural Networks , 1994 .

[21]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[22]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[23]  Tom Michael Mitchell,et al.  The Role of Unlabeled Data in Supervised Learning , 2004 .

[24]  G. McLachlan,et al.  The efficiency of a linear discriminant function based on unclassified initial samples , 1978 .

[25]  Peter E. Hart,et al.  The condensed nearest neighbor rule (Corresp.) , 1968, IEEE Trans. Inf. Theory.

[26]  Terence J. O'Neill Normal Discrimination with Unclassified Observations , 1978 .

[27]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[28]  Tong Zhang,et al.  Active learning using adaptive resampling , 2000, KDD '00.

[29]  David J. C. MacKay,et al.  Information-Based Objective Functions for Active Data Selection , 1992, Neural Computation.

[30]  Kevin Baker,et al.  Classification of radar returns from the ionosphere using neural networks , 1989 .

[31]  Zbigniew Michalewicz,et al.  Genetic Algorithms + Data Structures = Evolution Programs , 1996, Springer Berlin Heidelberg.

[32]  Christopher R. Houck,et al.  A Genetic Algorithm for Function Optimization: A Matlab Implementation , 2001 .

[33]  Martina Hasenjäger,et al.  Active Learning with Local Models , 1998, Neural Processing Letters.