A Comparison Study of Strategies for Combining Classifiers from Distributed Data Sources

Distributed data mining (DDM) is an important research area. The task of distributed data mining is to extract and integrate knowledge from different sources. Solving such tasks requires a special approach and tools, different from those applied to learning from data located in a single database. One of the approaches suitable for the DDM is to select relevant local patterns from the distributed databases. Such patterns often called prototypes, are subsequently merged to create a compact representation of the distributed data repositories. Next, the global classifier, called combiner, can be learned from such a compact representation. The paper proposes and reviews several strategies for constructing combiner classifiers to be used in solving the DDM tasks. Suggested strategies are evaluated experimentally. The evaluation process is based on several well-known benchmark data sets.

[1]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[2]  Gerhard Widmer,et al.  Machine Learning: ECML-97 , 1997, Lecture Notes in Computer Science.

[3]  Zoran Obradovic,et al.  The distributed boosting algorithm , 2001, KDD '01.

[4]  Ngoc Thanh Nguyen,et al.  New Frontiers in Applied Artificial Intelligence, 21st International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2008, Wroclaw, Poland, June 18-20, 2008, Proceedings , 2008, International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems.

[5]  Hongjun Lu,et al.  Identifying Relevant Databases for Multidatabase Mining , 1998, PAKDD.

[6]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[7]  Roberto Battiti,et al.  Democracy in neural nets: Voting schemes for classification , 1994, Neural Networks.

[8]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[9]  H. Sivakumar,et al.  Papyrus: A System for Data Mining over Local and Wide Area Clusters and Super-Clusters , 1999, ACM/IEEE SC 1999 Conference (SC'99).

[10]  Xiaofeng Zhang,et al.  Mining Local Data Sources For Learning Global Cluster Models , 2004, IEEE/WIC/ACM International Conference on Web Intelligence (WI'04).

[11]  Philip K. Chan,et al.  Meta-learning in distributed data mining systems: Issues and approaches , 2007 .

[12]  Piotr Jedrzejowicz,et al.  Data Reduction Algorithm for Machine Learning and Data Mining , 2008, IEA/AIE.

[13]  Salvatore J. Stolfo,et al.  JAM: Java Agents for Meta-Learning over Distributed Databases , 1997, KDD.

[14]  Kai Ming Ting,et al.  Model Combination in the Multiple-Data-Batches Scenario , 1997, ECML.