Learning Rules from Distributed Data

In this paper a concern about the accuracy (as a function of parallelism) of a certain class of distributed learning algorithms is raised, and one proposed improvement is illustrated.We focus on learning a single model from a set of disjoint data sets, which are distributed across a set of computers. The model is a set of rules. The distributed data sets may be disjoint for any of several reasons. In our approach, the first step is to construct a rule set (model) for each of the original disjoint data sets. Then rule sets are merged until an eventual final rule set is obtained which models the aggregate data. We show that this approach compares to directly creating a rule set from the aggregate data and promises faster learning. Accuracy can drop off as the degree of parallelism increases. However, an approach has been developed to extend the degree of parallelism achieved before this problem takes over.

[1]  Salvatore J. Stolfo,et al.  JAM: Java Agents for Meta-Learning over Distributed Databases , 1997, KDD.

[2]  Philip K. Chan Scaling Learning by Meta-Learning over Disjoint and Partially Replicated Data , 1996 .

[3]  J. Ross Quinlan,et al.  Generating Production Rules from Decision Trees , 1987, IJCAI.

[4]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[5]  Haym Hirsh,et al.  Incremental batch learning , 1989, ICML 1989.

[6]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[7]  J. Ross Quinlan,et al.  Improved Use of Continuous Attributes in C4.5 , 1996, J. Artif. Intell. Res..

[8]  Richard Kufrin,et al.  Generating C4.5 Production Rules in Parallel , 1997, AAAI/IAAI.

[9]  Sholom M. Weiss,et al.  Maximizing the Predictive Value of Production Rules , 1990, Artif. Intell..

[10]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[11]  Foster J. Provost,et al.  Distributed Machine Learning: Scaling Up with Coarse-grained Parallelism , 1994, ISMB.

[12]  Salvatore J. Stolfo,et al.  Toward Scalable Learning with Non-Uniform Class and Cost Distributions: A Case Study in Credit Card Fraud Detection , 1998, KDD.

[13]  Nitesh V. Chawla,et al.  Decision tree learning on very large data sets , 1998, SMC.

[14]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[15]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[16]  Foster J. Provost,et al.  Scaling Up: Distributed Machine Learning with Cooperation , 1996, AAAI/IAAI, Vol. 1.