Error reduction through learning multiple descriptions

Learning multiple descriptions for each class in the data has been shown to reduce generalization error but the amount of error reduction varies greatly from domain to domain. This paper presents a novel empirical analysis that helps to understand this variation. Our hypothesis is that the amount of error reduction is linked to the “degree to which the descriptions for a class make errors in a correlated manner.” We present a precise and novel definition for this notion and use twenty-nine data sets to show that the amount of observed error reduction is negatively correlated with the degree to which the descriptions make errors in a correlated manner. We empirically show that it is possible to learn descriptions that make less correlated errors in domains in which many ties in the search evaluation measure (e.g. information gain) are experienced during learning. The paper also presents results that help to understand when and why multiple descriptions are a help (irrelevant attributes) and when they are not as much help (large amounts of class noise).

[1]  J. R. Quinlan Improved estimates for the accuracy of small disjuncts , 1991 .

[2]  Robert C. Holte,et al.  Concept Learning and the Problem of Small Disjuncts , 1989, IJCAI.

[3]  C. Lokhorst,et al.  Knowledge Discovery in Dutch Dairy Databases , 1998 .

[4]  William G. Baxt,et al.  Improving the Accuracy of an Artificial Neural Network Using Multiple Differently Trained Networks , 1992, Neural Computation.

[5]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[6]  Saso Dzeroski,et al.  Background Knowledge and Declarative Bias in Inductive Concept Learning , 1992, AII.

[7]  Kent A. Spackman,et al.  Learning Categorical Decision Criteria in Biomedical Domains , 1988, ML.

[8]  Peter Clark,et al.  Rule Induction with CN2: Some Recent Improvements , 1991, EWSL.

[9]  Luís Torgo,et al.  Rule Combination in Inductive Learning , 1993, ECML.

[10]  J. R. Quinlan Learning Logical Definitions from Relations , 1990 .

[11]  R. Michalski,et al.  Learning from Observation: Conceptual Clustering , 1983 .

[12]  Brian D. Ripley,et al.  Stochastic Simulation , 2005 .

[13]  Wray L. Buntine,et al.  A theory of learning classification rules , 1990 .

[14]  J. York,et al.  Bayesian Graphical Models for Discrete Data , 1995 .

[15]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[16]  Foster J. Provost,et al.  Small Disjuncts in Action: Learning to Diagnose Errors in the Local Loop of the Telephone Network , 1993, ICML.

[17]  Stephen Muggleton,et al.  Efficient Induction of Logic Programs , 1990, ALT.

[18]  Harris Drucker,et al.  The Boosting Approach to Machine Learning An Overview , 2003 .

[19]  Walter L. Smith Probability and Statistics , 1959, Nature.

[20]  Matevz Kovacic Stochastic Inductive Logic Programming , 1994 .

[21]  Michael J. Pazzani,et al.  Classification Using Bayes Averaging of Multiple, Relational Rule-based Models , 1995, AISTATS.

[22]  Ashwin Srinivasan,et al.  Compression, Significance, and Accuracy , 1992, ML.

[23]  Igor Kononenko,et al.  Learning as Optimization: Stochastic Generation of Multiple Knowledge , 1992, ML.

[24]  Michael J. Pazzani,et al.  Reducing the small disjuncts problem by learning probabilistic concept descriptions , 1998 .

[25]  Chris Carter,et al.  Multiple decision trees , 2013, UAI.

[26]  Thomas G. Dietterich,et al.  Error-Correcting Output Coding Corrects Bias and Variance , 1995, ICML.

[27]  Lars Kai Hansen,et al.  Neural Network Ensembles , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[28]  J. Lloyd Foundations of Logic Programming , 1984, Symbolic Computation.

[29]  King-Sun Fu,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence Publication Information , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Peter Hernon,et al.  International encyclopedia of statistics , 1979 .

[31]  Derek Sleeman,et al.  Machine learning : proceedings of the ninth international workshop (ML92) , 1992 .

[32]  Michael J. Pazzani,et al.  Detecting and correcting errors in rule-based expert systems: an integration of empirical and explanation-based learning , 1991 .

[33]  Padhraic Smyth,et al.  Rule Induction Using Information Theory , 1991, Knowledge Discovery in Databases.

[34]  Matjaz Gams,et al.  Analysis of Classification With Two Classifiers , 1992, AIMSA.

[35]  M. Perrone Improving regression estimation: Averaging methods for variance reduction with extensions to general convex measure optimization , 1993 .

[36]  Jude Shavlik,et al.  Refinement ofApproximate Domain Theories by Knowledge-Based Neural Networks , 1990, AAAI.

[37]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[38]  D. C. Howell Statistical Methods for Psychology , 1987 .

[39]  Padhraic Smyth,et al.  A Hybrid Rule-Based/Bayesian Classifier , 1990, ECAI.

[40]  Robert E. Schapire,et al.  The strength of weak learnability , 1990, Mach. Learn..

[41]  Luís Torgo,et al.  Knowledge Acquisition via Knowledge Integration , 1990 .

[42]  Stephen Muggleton,et al.  An Experimental Comparison of Human and Machine Learning Formalisms , 1989, ML.

[43]  Michael J. Pazzani,et al.  A Knowledge-intensive Approach to Learning Relational Concepts , 1991, ML.

[44]  Michael J. Pazzani,et al.  HYDRA: A Noise-tolerant Relational Concept Learning Algorithm , 1993, IJCAI.

[45]  John Gaschnig,et al.  MODEL DESIGN IN THE PROSPECTOR CONSULTANT SYSTEM FOR MINERAL EXPLORATION , 1981 .

[46]  Michael J. Pazzani,et al.  Hydra-mm: Learning Multiple Descriptions to Improve Classification Accuracy , 1995, Int. J. Artif. Intell. Tools.