论文信息 - The distributed boosting algorithm

The distributed boosting algorithm

In this paper, we propose a general framework for distributed boosting intended for efficient integrating specialized classifiers learned over very large and distributed homogeneous databases that cannot be merged at a single location. Our distributed boosting algorithm can also be used as a parallel classification technique, where a massive database that cannot fit into main computer memory is partitioned into disjoint subsets for a more efficient analysis. In the proposed method, at each boosting round the classifiers are first learned from disjoint datasets and then exchanged amongst the sites. Finally the classifiers are combined into a weighted voting ensemble on each disjoint data set. The ensemble that is applied to an unseen test set represents an ensemble of ensembles built on all distributed sites. In experiments performed on four large data sets the proposed distributed boosting method achieved classification accuracy comparable or even slightly better than the standard boosting algorithm while requiring less memory and less computational time. In addition, the communication overhead of the distributed boosting algorithm is very small making it a viable alternative to the standard boosting for large-scale databases.

Zoran Obradovic | Aleksandar Lazarevic | Z. Obradovic | A. Lazarevic

[1] Yoav Freund,et al. Experiments with a New Boosting Algorithm , 1996, ICML.

[2] Salvatore J. Stolfo,et al. The application of AdaBoost for distributed, scalable and on-line learning , 1999, KDD '99.

[3] Mohammad Bagher Menhaj,et al. Training feedforward networks with the Marquardt algorithm , 1994, IEEE Trans. Neural Networks.

[4] Denis J. Dean,et al. Comparison of neural networks and discriminant analysis in predicting forest cover types , 1998 .

[5] Martin A. Riedmiller,et al. A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[6] Haym Hirsh,et al. Incremental batch learning , 1989, ICML 1989.

[7] Zoran Obradovic,et al. Effective pruning of neural network classifier ensembles , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[8] Yike Guo,et al. Probing Knowledge in Distributed Data Mining , 1999, PAKDD.

[9] Paul E. Utgoff,et al. An Improved Algorithm for Incremental Induction of Decision Trees , 1994, ICML.

[10] Catherine Blake,et al. UCI Repository of machine learning databases , 1998 .