Compact Dual Ensembles for Active Learning

Generic ensemble methods can achieve excellent learning performance, but are not good candidates for active learning because of their different design purposes. We investigate how to use diversity of the member classifiers of an ensemble for efficient active learning. We empirically show, using benchmark data sets, that (1) to achieve a good (stable) ensemble, the number of classifiers needed in the ensemble varies for different data sets; (2) feature selection can be applied for classifier selection from ensembles to construct compact ensembles with high performance. Benchmark data sets and a real-world application are used to demonstrate the effectiveness of the proposed approach.

[1]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[2]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[3]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[4]  Huan Liu,et al.  Class-specific ensembles for active learning in digital imagery , 2004 .

[5]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[6]  Kamal Nigamyknigam,et al.  Employing Em in Pool-based Active Learning for Text Classiication , 1998 .

[7]  Pat Langley,et al.  Editorial: On Machine Learning , 1986, Machine Learning.

[8]  Andrew McCallum,et al.  Toward Optimal Active Learning through Sampling Estimation of Error Reduction , 2001, ICML.

[9]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[10]  H. Sebastian Seung,et al.  Selective Sampling Using the Query by Committee Algorithm , 1997, Machine Learning.

[11]  H. Sebastian Seung,et al.  Query by committee , 1992, COLT '92.

[12]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[13]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[14]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[15]  Ian Witten,et al.  Data Mining , 2000 .

[16]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.