Active ensemble learning: Application to data mining and bioinformatics

This paper describes a new set of learning procedures which have been proposed by the authors. The method combines active learning and the accuracy enhancement techniques of bagging and boosting, and may be called active ensemble learning. Any of these procedures achieves highly accurate learning by iteratively selecting (querying) a small amount of data with large information content from a data space or database. This paper describes not only the technical aspect of the method, but also the results of application to two real problems, namely, active planning of biochemical or molecular biological experiments in immunology, and customer segmentation from a large-scale body of data in the CRM (customer relationship management) field. It is demonstrated that the proposed methods can achieve greater data efficiency and prediction accuracy than conventional methods. © 2007 Wiley Periodicals, Inc. Syst Comp Jpn, 38(11): 100–108, 2007; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/scj.10355

[1]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[2]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[3]  H Mamitsuka,et al.  Predicting peptides that bind to MHC molecules using supervised learning of hidden markov models , 1998, Proteins.

[4]  Foster J. Provost,et al.  Active Learning for Class Probability Estimation and Ranking , 2001, IJCAI.

[5]  Foster J. Provost,et al.  A Survey of Methods for Scaling Up Inductive Algorithms , 1999, Data Mining and Knowledge Discovery.

[6]  Dana Angluin,et al.  Learning Regular Sets from Queries and Counterexamples , 1987, Inf. Comput..

[7]  Lars Kai Hansen,et al.  Neural Network Ensembles , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Naoki Abe,et al.  Efficient Mining from Large Databases by Query Learning , 2000, International Conference on Machine Learning.

[9]  H. Sebastian Seung,et al.  Query by committee , 1992, COLT '92.

[10]  Taku Suto,et al.  An automated prediction of MHC class I-binding peptides based on positional scanning with peptide libraries , 2000, Immunogenetics.

[11]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[12]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[13]  Tong Zhang,et al.  Active learning using adaptive resampling , 2000, KDD '00.

[14]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[15]  Naoki Abe,et al.  Query Learning Strategies Using Boosting and Bagging , 1998, ICML.

[16]  K. Udaka Decrypting class I MHC-bound peptides with peptide libraries. , 1996, Trends in biochemical sciences.

[17]  Craig A. Knoblock,et al.  Selective Sampling with Redundant Views , 2000, AAAI/IAAI.

[18]  Leo Breiman,et al.  Pasting Small Votes for Classification in Large Databases and On-Line , 1999, Machine Learning.

[19]  Vladimir Brusic,et al.  MHCPEP, a database of MHC-binding peptides: update 1996 , 1997, Nucleic Acids Res..

[20]  R. Clemen Combining forecasts: A review and annotated bibliography , 1989 .

[21]  V. Brusic,et al.  Neural network-based prediction of candidate T-cell epitopes , 1998, Nature Biotechnology.