Active Learning Literature Survey

The most time consuming and expensive task in machine learning is the gathering of labeled data to train the model or to estimate its parameters. In the real-world scenario, the availability of labeled data is scarce and we have limited resources to label the abundantly available unlabeled data. Hence it makes sense to pick only the most informative instances from the unlabeled data and request an expert to provide the label for that instance. Active learning algorithms aim at minimizing the amount of labeled data required to achieve the goal of the machine learning task in hand by strategically selecting the data instance to be labeled by the expert. A lot of research has been conducted in this area over the past two decades leading to great improvements in performance of several existing machine learning algorithms and has also been applied to diverse fields like text classification, information retrieval, computer vision and bioinformatics, to name a few. This survey aims at providing an insight into the research in this area and categorizes the diverse algorithms proposed based on main characteristics. We also provides a desk where different active learning algorithms can be compared by evaluation on benchmark datasets.

[1]  Dana Angluin,et al.  Queries and concept learning , 1988, Machine Learning.

[2]  H. Sebastian Seung,et al.  Query by committee , 1992, COLT '92.

[3]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[4]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[5]  Nello Cristianini,et al.  Query Learning with Large Margin Classi ersColin , 2000 .

[6]  Greg Schohn,et al.  Less is More: Active Learning with Support Vector Machines , 2000, ICML.

[7]  Tong Zhang,et al.  Active learning using adaptive resampling , 2000, KDD '00.

[8]  Andrew McCallum,et al.  Toward Optimal Active Learning through Sampling Estimation of Error Reduction , 2001, ICML.

[9]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[10]  H. Sebastian Seung,et al.  Selective Sampling Using the Query by Committee Algorithm , 1997, Machine Learning.

[11]  Ran El-Yaniv,et al.  Online Choice of Active Learning Algorithms , 2003, J. Mach. Learn. Res..

[12]  Arnold W. M. Smeulders,et al.  Active learning using pre-clustering , 2004, ICML.

[13]  C. A. Murthy,et al.  A probabilistic active support vector learning algorithm , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Kun Deng,et al.  Balancing exploration and exploitation: a new algorithm for active machine learning , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).