The application of AdaBoost for distributed, scalable and on-line learning

We propose to use AdaBoost to efficiently learn classifiers over very large and possibly distributed data sets that cannot fit into main memory, as well as on-line learning where new data become available periodically. We propose two new ways to apply AdaBoost. The first allows the use of a small sample of the weighted training set to compute a weak hypothesis. The second approach involves using AdaBoost as a means to re-weight classifiers in an ensemble, and thus to reuse previously computed classifiers along with new classifier computed on a new increment of data. These two techniques of using AdaBoost provide scalable, distributed and on-line learning. We discuss these methods and their implementation in JAM, an agent-based learning system. Empirical studies on four real world and artifical data sets have shown results that are either comparable to or better than learning classifiers over the complete training set and, in some cases, are comparable to boosting on the complete data set. However, our algorithms use much smaller samples of the training set and require much less memory.

[1]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[2]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[3]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[4]  Vipin Kumar,et al.  Scalable parallel data mining for association rules , 1997, SIGMOD '97.

[5]  L. Breiman Arcing Classifiers , 1998 .

[6]  Salvatore J. Stolfo,et al.  An extensible meta-learning approach for scalable and accurate inductive learning , 1996 .

[7]  Salvatore J. Stolfo,et al.  JAM: Java Agents for Meta-Learning over Distributed Databases , 1997, KDD.

[8]  Paul E. Utgoff,et al.  An Improved Algorithm for Incremental Induction of Decision Trees , 1994, ICML.

[9]  Haym Hirsh,et al.  Incremental batch learning , 1989, ICML 1989.

[10]  Pedro M. Domingos Knowledge Acquisition from Examples Via Multiple Models , 1997 .

[11]  Foster J. Provost,et al.  Scaling Up: Distributed Machine Learning with Cooperation , 1996, AAAI/IAAI, Vol. 1.

[12]  John Shawe-Taylor,et al.  Optimizing Classifers for Imbalanced Training Sets , 1998, NIPS.

[13]  M. Pazzani,et al.  Handling Redundancy in Ensembles of Learned Models Using Principal Components , 1998 .

[14]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[15]  Salvatore J. Stolfo,et al.  Sharing Learned Models among Remote Database Partitions by Local Meta-Learning , 1996, KDD.

[16]  Ramakrishnan Srikant,et al.  The Quest Data Mining System , 1996, KDD.

[17]  Thomas G. Dietterich,et al.  Pruning Adaptive Boosting , 1997, ICML.

[18]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.