On the Accuracy of Meta-learning for Scalable Data Mining

In this paper, wedescribe a general approach to scaling data mining applications thatwe have come to call meta-learning. Meta-Learningrefers to a general strategy that seeks to learn how to combine anumber of separate learning processes in an intelligent fashion. Wedesire a meta-learning architecture that exhibits two key behaviors.First, the meta-learning strategy must produce an accurate final classification system. This means that a meta-learning architecturemust produce a final outcome that is at least as accurate as aconventional learning algorithm applied to all available data.Second, it must be fast, relative to an individual sequential learningalgorithm when applied to massive databases of examples, and operatein a reasonable amount of time. This paper focussed primarily onissues related to the accuracy and efficacy of meta-learning as ageneral strategy. A number of empirical results are presenteddemonstrating that meta-learning is technically feasible in wide-area,network computing environments.

[1]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[2]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, CACM.

[3]  Salvatore J. Stolfo,et al.  Speech Recognition in Parallel , 1989, HLT.

[4]  Manfred K. Warmuth,et al.  The weighted majority algorithm , 1989, 30th Annual Symposium on Foundations of Computer Science.

[5]  Jude Shavlik,et al.  Refinement ofApproximate Domain Theories by Knowledge-Based Neural Networks , 1990, AAAI.

[6]  Robert E. Schapire,et al.  The strength of weak learnability , 1990, Mach. Learn..

[7]  Jason Catlett,et al.  Megainduction: A Test Flight , 1991, ML.

[8]  Wray L. Buntine,et al.  Introduction in IND and recursive partitioning , 1991 .

[9]  J. Mesirov,et al.  Hybrid system for protein secondary structure prediction. , 1992, Journal of molecular biology.

[10]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[11]  Adam Krzyżak,et al.  Methods of combining multiple classifiers and their applications to handwriting recognition , 1992, IEEE Trans. Syst. Man Cybern..

[12]  Jude W. Shavlik,et al.  Learning to Represent Codons: A Challenge Problem for Constructive Induction , 1993, IJCAI.

[13]  Salvatore J. Stolfo,et al.  Toward parallel and distributed learning by meta-learning , 1993 .

[14]  Salvatore J. Stolfo,et al.  Experiments on multistrategy learning by meta-learning , 1993, CIKM '93.

[15]  Anders Krogh,et al.  Neural Network Ensembles, Cross Validation, and Active Learning , 1994, NIPS.

[16]  Philip K. Chan Scaling Learning by Meta-Learning over Disjoint and Partially Replicated Data , 1996 .

[17]  M. Pazzani,et al.  Error Reduction through Learning Multiple Descriptions , 1996, Machine Learning.

[18]  Michael J. Pazzani,et al.  Error reduction through learning multiple descriptions , 2004, Machine Learning.

[19]  Peter Clark,et al.  The CN2 Induction Algorithm , 1989, Machine Learning.

[20]  Steven Salzberg,et al.  A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features , 2004, Machine Learning.

[21]  Thomas G. Dietterich,et al.  A Study of Explanation-Based Methods for Inductive Learning , 1989, Machine Learning.

[22]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.