Meta-learning in distributed data mining systems: Issues and approaches

Data mining systems aim to discover patterns and extract useful information from facts recorded in databases. A widely adopted approach to this objective is to apply various machine learning algorithms to compute descriptive models of the available data. Here, we explore one of the main challenges in this research area, the development of techniques that scale up to large and possibly physically distributed databases. Meta-learning is a technique that seeks to compute higher-level classifiers (or classification models), called meta-classifiers, that integrate in some principled fashion multiple classifiers computed separately over different databases. This study, describes meta-learning and presents the JAM system (Java Agents for Meta-learning), an agent-based meta-learning system for large-scale data mining applications. Specifically, it identifies and addresses several important desiderata for distributed data mining systems that stem from their additional complexity compared to centralized or host-based systems. Distributed systems may need to deal with heterogenous platforms, with multiple databases and (possibly) different schemas, with the design and implementation of scalable and effective protocols for communicating among the data sites, and the selective and efficient use of the information that is gathered from other peer data sites. Other important problems, intrinsic within ∗Supported in part by an IBM fellowship. data mining systems that must not be ignored, include, first, the ability to take advantage of newly acquired information that was not previously available when models were computed and combine it with existing models, and second, the flexibility to incorporate new machine learning methods and data mining technologies. We explore these issues within the context of JAM and evaluate various proposed solutions through extensive empirical studies.

[1]  Marvin Minsky,et al.  Perceptrons: An Introduction to Computational Geometry , 1969 .

[2]  Robert E. Schapire,et al.  The strength of weak learnability , 1990, Mach. Learn..

[3]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[4]  David W. Opitz,et al.  Generating Accurate and Diverse Members of a Neural-Network Ensemble , 1995, NIPS.

[5]  Ron Kohavi,et al.  The Case against Accuracy Estimation for Comparing Induction Algorithms , 1998, ICML.

[6]  R. H. Myers Classical and modern regression with applications , 1986 .

[7]  Robert L. Grossman,et al.  The Preliminary Design of Papyrus: A System for High Performance Distributed Data Mining over Cluste , 1998, AAAI 1998.

[8]  Jorma Rissanen,et al.  SLIQ: A Fast Scalable Classifier for Data Mining , 1996, EDBT.

[9]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[10]  E. P. Lewis Information overload. , 1976, Nursing outlook.

[11]  Andrew W. Moore,et al.  Locally Weighted Learning , 1997, Artificial Intelligence Review.

[12]  Salvatore J. Stolfo,et al.  Mining Databases with Different Schemas: Integrating Incompatible Classifiers , 1998, KDD.

[13]  Michael J. Pazzani,et al.  A Principal Components Approach to Combining Regression Estimates , 1999, Machine Learning.

[14]  Xindong Wu,et al.  Multi-layer Incremental Induction , 1998, PRICAI.

[15]  Salvatore J. Stolfo,et al.  Mining Audit Data to Build Intrusion Detection Models , 1998, KDD.

[16]  Haym Hirsh,et al.  Incremental batch learning , 1989, ICML 1989.

[17]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[18]  Salvatore J. Stolfo,et al.  JAM: Java Agents for Meta-Learning over Distributed Databases , 1997, KDD.

[19]  Paul E. Utgoff,et al.  An Improved Algorithm for Incremental Induction of Decision Trees , 1994, ICML.

[20]  Lars Kai Hansen,et al.  Neural Network Ensembles , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  James Kelly,et al.  AutoClass: A Bayesian Classification System , 1993, ML.

[22]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[23]  Paul E. Utgoff,et al.  Incremental Induction of Decision Trees , 1989, Machine Learning.

[24]  Peter Clark,et al.  The CN2 induction algorithm , 2004, Machine Learning.

[25]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[26]  L. Cooper,et al.  When Networks Disagree: Ensemble Methods for Hybrid Neural Networks , 1992 .

[27]  John H. Holland,et al.  Escaping brittleness: the possibilities of general-purpose learning algorithms applied to parallel rule-based systems , 1995 .

[28]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[29]  David L. Waltz,et al.  Toward memory-based reasoning , 1986, CACM.

[30]  Richard J. Mammone,et al.  Artificial neural networks for speech and vision , 1994 .

[31]  Salvatore J. Stolfo,et al.  Toward parallel and distributed learning by meta-learning , 1993 .

[32]  S. Stolfo,et al.  Pruning Meta-Classifiers in a Distributed Data Mining System , 1998 .

[33]  Gregory Piatetsky-Shapiro,et al.  Advances in Knowledge Discovery and Data Mining , 2004, Lecture Notes in Computer Science.

[34]  Volker Tresp,et al.  Combining Estimators Using Non-Constant Weighting Functions , 1994, NIPS.

[35]  Salvatore J. Stolfo,et al.  Sharing Learned Models among Remote Database Partitions by Local Meta-Learning , 1996, KDD.

[36]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[37]  Salvatore J. Stolfo,et al.  Toward Scalable Learning with Non-Uniform Class and Cost Distributions: A Case Study in Credit Card Fraud Detection , 1998, KDD.

[38]  Kenneth A. De Jong,et al.  Using genetic algorithms for concept learning , 1993, Machine Learning.

[39]  Thomas G. Dietterich Machine-Learning Research Four Current Directions , 1997 .

[40]  Paul E. Utgoff,et al.  ID5: An Incremental ID3 , 1987, ML.

[41]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[42]  Rakesh Agrawal,et al.  SPRINT: A Scalable Parallel Classifier for Data Mining , 1996, VLDB.

[43]  R. Lippmann,et al.  An introduction to computing with neural nets , 1987, IEEE ASSP Magazine.

[44]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[45]  Linda Salchenberger,et al.  A neural network model for estimating option prices , 1993, Applied Intelligence.

[46]  Tom Fawcett,et al.  Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distributions , 1997, KDD.

[47]  K. De Jong,et al.  Using Genetic Algorithms for Concept Learning , 2004, Machine Learning.

[48]  Christopher J. Merz,et al.  Using Correspondence Analysis to Combine Classifiers , 1999, Machine Learning.

[49]  Kenneth DeJong,et al.  Learning with genetic algorithms: An overview , 1988, Machine Learning.

[50]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[51]  R. Detrano,et al.  International application of a new probability algorithm for the diagnosis of coronary artery disease. , 1989, The American journal of cardiology.

[52]  Tom M. Mitchell,et al.  Generalization as Search , 2002 .

[53]  Leo Breiman,et al.  Stacked regressions , 2004, Machine Learning.

[54]  JoBea Way,et al.  The evolution of synthetic aperture radar systems and their progression to the EOS SAR , 1991, IEEE Trans. Geosci. Remote. Sens..

[55]  Anders Krogh,et al.  Neural Network Ensembles, Cross Validation, and Active Learning , 1994, NIPS.

[56]  Salvatore J. Stolfo,et al.  A Comparative Evaluation of Voting and Meta-learning on Partitioned Data , 1995, ICML.

[57]  Thomas G. Dietterich,et al.  Pruning Adaptive Boosting , 1997, ICML.

[58]  Dean A. Pomerleau,et al.  Neural Network Perception for Mobile Robot Guidance , 1993 .

[59]  Salvatore J. Stolfo,et al.  Credit Card Fraud Detection Using Meta-Learning: Issues and Initial Results 1 , 1997 .

[60]  Tom M. Mitchell,et al.  Does Machine Learning Really Work? , 1997, AI Mag..

[61]  Salvatore J. Stolfo,et al.  Effective and Efficient Pruning of Meta-Classifiers in a Distributed Data Mining System , 1998 .

[62]  Zbigniew W. Ras,et al.  Answering Non-Standard Queries in Distributed Knowledge-Based Systems , 1998 .

[63]  R. Tibshirani,et al.  Combining Estimates in Regression and Classification , 1996 .

[64]  S. Salzberg,et al.  A weighted nearest neighbor algorithm for learning with symbolic features , 2004, Machine Learning.

[65]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[66]  Salvatore J. Stolfo,et al.  Experiments on multistrategy learning by meta-learning , 1993, CIKM '93.

[67]  Pedro M. Domingos Efficient Specific-to-General Rule Induction , 1996, KDD.

[68]  Ryszard S. Michalski,et al.  A theory and methodology of inductive learning , 1993 .

[69]  J MerzChristopher Using Correspondence Analysis to Combine Classifiers , 1999 .

[70]  Tom Fawcett,et al.  Robust Classification Systems for Imprecise Environments , 1998, AAAI/IAAI.

[71]  Salvatore J. Stolfo,et al.  Management of intelligent learning agents in distributed data mining systems , 1999 .