Prototype Selection for Composite Nearest Neighbor Classifiers

Combining the predictions of a set of classifiers has been shown to be an effective way to create composite classifiers that are more accurate than any of the component classifiers. Increased accuracy has been shown in a variety of real-world applications, ranging from protein sequence identification to determining the fat content of ground meat. Despite such individual successes, the answers are not known to fundamental questions about classifier combination, such as "Can classifiers from any given model class be combined to create a composite classifier with higher accuracy?" or "Is it possible to increase the accuracy of a given classifier by combining its predictions with those of only a small number of other classifiers?". The goal of this dissertation is to provide answers to these and closely related questions with respect to a particular model class, the class of nearest neighbor classifiers. We undertake the first study that investigates in depth the combination of nearest neighbor classifiers. Although previous research has questioned the utility of combining nearest neighbor classifiers, we introduce algorithms that combine a small number of component nearest neighbor classifiers, where each of the components stores a small number of prototypical instances. In a variety of domains, we show that these algorithms yield composite classifiers that are more accurate than a nearest neighbor classifier that stores all training instances as prototypes. The research presented in this dissertation also extends previous work on prototype selection for an independent nearest neighbor classifier. We show that in many domains, storing a very small number of prototypes can provide classification accuracy greater than or equal to that of a nearest neighbor classifier that stores all training instances. We extend previous work by demonstrating that algorithms that rely primarily on random sampling can effectively choose a small number of prototypes.

[1]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[2]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[3]  Adam Krzyżak,et al.  Methods of combining multiple classifiers and their applications to handwriting recognition , 1992, IEEE Trans. Syst. Man Cybern..

[4]  F. Girosi,et al.  Networks for approximation and learning , 1990, Proc. IEEE.

[5]  Tharam S. Dillon,et al.  An example of integrating legal case based reasoning with object-oriented rule-based systems: IKBALS II , 1991, ICAIL '91.

[6]  Teuvo Kohonen,et al.  Self-Organization and Associative Memory, Third Edition , 1989, Springer Series in Information Sciences.

[7]  Brian D. Ripley,et al.  Pattern Recognition and Neural Networks , 1996 .

[8]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[9]  Gerald J. Sussman,et al.  Structure and interpretation of computer programs , 1985, Proceedings of the IEEE.

[10]  Garrison W. Cottrell,et al.  Learning Mackey-Glass from 25 Examples, Plus or Minus 2 , 1993, NIPS.

[11]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[12]  James M. Hutchinson,et al.  A radial basis function approach to financial time series analysis , 1993 .

[13]  C. W. Swonger SAMPLE SET CONDENSATION FOR A CONDENSED NEAREST NEIGHBOR DECISION RULE FOR PATTERN RECOGNITION , 1972 .

[14]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[15]  Sebastian Thrun,et al.  The MONK''s Problems-A Performance Comparison of Different Learning Algorithms, CMU-CS-91-197, Sch , 1991 .

[16]  M. Golea,et al.  A Convergence Theorem for Sequential Learning in Two-Layer Perceptrons , 1990 .

[17]  R. Clemen Combining forecasts: A review and annotated bibliography , 1989 .

[18]  M. Perrone Improving regression estimation: Averaging methods for variance reduction with extensions to general convex measure optimization , 1993 .

[19]  David W. Aha,et al.  Comparing Instance-Averaging with Instance-Filtering Learning Algorithms , 1988, EWSL.

[20]  Richard A. Johnson,et al.  Applied Multivariate Statistical Analysis , 1983 .

[21]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[22]  Belur V. Dasarathy,et al.  Nosing Around the Neighborhood: A New System Structure and Classification Rule for Recognition in Partially Exposed Environments , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  T. M. English,et al.  Stacked generalization and simulated evolution. , 1996, Bio Systems.

[24]  Corinna Cortes,et al.  Boosting Decision Trees , 1995, NIPS.

[25]  Paul S. Rosenbloom,et al.  Improving Rule-Based Systems Through Case-Based Reasoning , 1991, AAAI.

[26]  Jude W. Shavlik,et al.  Combining the Predictions of Multiple Classifiers: Using Competitive Learning to Initialize Neural Networks , 1995, IJCAI.

[27]  Edward Hirsch Levi,et al.  An Introduction to Legal Reasoning , 1950 .

[28]  Chin-Liang Chang,et al.  Finding Prototypes For Nearest Neighbor Classifiers , 1974, IEEE Transactions on Computers.

[29]  Anders Krogh,et al.  Neural Network Ensembles, Cross Validation, and Active Learning , 1994, NIPS.

[30]  irwin Guttman,et al.  Introductory Engineering Statistics , 1965 .

[31]  T. Ash,et al.  Dynamic node creation in backpropagation networks , 1989, International 1989 Joint Conference on Neural Networks.

[32]  Johannes R. Sveinsson,et al.  Parallel consensual neural networks , 1997, IEEE Trans. Neural Networks.

[33]  John G. Kemeny,et al.  The Use of Simplicity in Induction , 1953 .

[34]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[35]  E. Rosch,et al.  Family resemblances: Studies in the internal structure of categories , 1975, Cognitive Psychology.

[36]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[37]  H. Sebastian Seung,et al.  Query by committee , 1992, COLT '92.

[38]  C. G. Hilborn,et al.  The Condensed Nearest Neighbor Rule , 1967 .

[39]  Salvatore J. Stolfo,et al.  Toward parallel and distributed learning by meta-learning , 1993 .

[40]  John Moody,et al.  Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.

[41]  Brian Everitt,et al.  Principles of Multivariate Analysis , 2001 .

[42]  Thomas H. Wonnacott,et al.  Introductory Statistics , 2007, Technometrics.

[43]  Nageswara S. V. Rao,et al.  N-learners Problem: Fusion Of Concepts , 1992, Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems.

[44]  Claire Cardie,et al.  Domain-specific knowledge acquisition for conceptual sentence analysis , 1995 .

[45]  David B. Skalak,et al.  Prototype and Feature Selection by Sampling and Random Mutation Hill Climbing Algorithms , 1994, ICML.

[46]  Kenneth Steiglitz,et al.  Combinatorial Optimization: Algorithms and Complexity , 1981 .

[47]  Harris Drucker,et al.  Boosting and Other Ensemble Methods , 1994, Neural Computation.

[48]  Sholom M. Weiss,et al.  An Empirical Comparison of Pattern Recognition, Neural Nets, and Machine Learning Classification Methods , 1989, IJCAI.

[49]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[50]  Thomas G. Dietterich,et al.  Error-Correcting Output Coding Corrects Bias and Variance , 1995, ICML.

[51]  Lars Kai Hansen,et al.  Neural Network Ensembles , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[52]  Edward E. Smith,et al.  Categories and concepts , 1984 .

[53]  Ekaterini P. Sycara Resolving adversarial conflicts: an approach integration case-based and analytic methods , 1987 .

[54]  Cullen Schaffer Cross-Validation, Stacking and Bi-Level Stacking: Meta-Methods for Classification Learning , 1994 .

[55]  Glenn Shafer,et al.  A Mathematical Theory of Evidence , 2020, A Mathematical Theory of Evidence.

[56]  David L. Waltz,et al.  Toward memory-based reasoning , 1986, CACM.

[57]  I. Bross Sample Survey Methods and Theory. Volume I. Methods and Applications.Morris H. Hansen , William N. Hurwitz , William G. Madow , 1954 .

[58]  Salvatore J. Stolfo,et al.  A Comparative Evaluation of Voting and Meta-learning on Partitioned Data , 1995, ICML.

[59]  G. W. Milligan,et al.  An examination of procedures for determining the number of clusters in a data set , 1985 .

[60]  J. Nadal,et al.  Learning in feedforward layered networks: the tiling algorithm , 1989 .

[61]  Jerome M. Kurtzberg,et al.  Feature Analysis for Symbol Recognition by Elastic Matching , 1987, IBM J. Res. Dev..

[62]  Michael de la Maza A Prototype Based Symbolic Concept Learning System , 1991, ML.

[63]  Harris Drucker Fast Decision Tree Ensembles for Optical Character Recognition , 1996 .

[64]  Richard S. Sutton,et al.  Online Learning with Random Representations , 1993, ICML.

[65]  EDWINA RISSLAND MICHENER,et al.  Understanding Understanding Mathematics , 1978, Cogn. Sci..

[66]  Yali Amit,et al.  Joint Induction of Shape Features and Tree Classifiers , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[67]  James Patrick Callan,et al.  Knowledge-based feature generation for inductive learning , 1993 .

[68]  I. Tomek An Experiment with the Edited Nearest-Neighbor Rule , 1976 .

[69]  Jianping Zhang,et al.  Selecting Typical Instances in Instance-Based Learning , 1992, ML.

[70]  G. Lakoff Women, fire, and dangerous things : what categories reveal about the mind , 1989 .

[71]  Carla E. Brodley Recursive automatic algorithm selection for inductive learning , 1995 .

[72]  Edwina L. Rissland,et al.  CABARET: Rule Interpretation in a Hybrid Architecture , 1991, Int. J. Man Mach. Stud..

[73]  David W. Aha,et al.  A study of instance-based algorithms for supervised learning tasks: mathematical, empirical, and psychological evaluations , 1990 .

[74]  Kevin D. Ashley Modeling legal argument - reasoning with cases and hypotheticals , 1991, Artificial intelligence and legal reasoning.

[75]  Ming Tan,et al.  Two Case Studies in Cost-Sensitive Concept Acquisition , 1990, AAAI.

[76]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[77]  Victor R. Lesser,et al.  IPUS: An Architecture for the Integrated Processing and Understanding of Signals , 1995, Artif. Intell..

[78]  L. A. Goodman,et al.  Social Choice and Individual Values , 1951 .

[79]  Pietro Burrascano,et al.  Learning vector quantization for the probabilistic neural network , 1991, IEEE Trans. Neural Networks.

[80]  Sargur N. Srihari,et al.  Decision Combination in Multiple Classifier Systems , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[81]  B. Efron Computers and the Theory of Statistics: Thinking the Unthinkable , 1979 .

[82]  Jude W. Shavlik,et al.  Growing Simpler Decision Trees to Facilitate Knowledge Discovery , 1996, KDD.

[83]  Nils J. Nilsson,et al.  The Mathematical Foundations of Learning Machines , 1990 .

[84]  Sherif Hashem,et al.  Optimal Linear Combinations of Neural Networks , 1997, Neural Networks.

[85]  Edwina L. Rissland,et al.  CABOT: An Adaptive Approach to Case-Based Search , 1991, IJCAI.

[86]  David W. Opitz,et al.  Generating Accurate and Diverse Members of a Neural-Network Ensemble , 1995, NIPS.

[87]  J. Mesirov,et al.  Hybrid system for protein secondary structure prediction. , 1992, Journal of molecular biology.

[88]  Belur V. Dasarathy,et al.  Nearest neighbor (NN) norms: NN pattern classification techniques , 1991 .

[89]  Chris Carter,et al.  Multiple decision trees , 2013, UAI.

[90]  Léon Bottou,et al.  Local Learning Algorithms , 1992, Neural Computation.

[91]  Kevin D. Ashley,et al.  Explaining and Arguing With Examples , 1984, AAAI.

[92]  Andrew W. Moore,et al.  Acquisition of Dynamic Control Knowledge for a Robotic Manipulator , 1990, ML.

[93]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[94]  James R. Munkres,et al.  Topology; a first course , 1974 .

[95]  Marcus Frean,et al.  The Upstart Algorithm: A Method for Constructing and Training Feedforward Neural Networks , 1990, Neural Computation.

[96]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[97]  Ashok K. Goel,et al.  Integration of case-based reasoning and model-based reasoning for adaptive design problem-solving , 1989 .

[98]  R. Bareiss Exemplar-Based Knowledge Acquisition , 1989 .

[99]  T. Caliński,et al.  A dendrite method for cluster analysis , 1974 .

[100]  Chris Chatfield,et al.  The Analysis of Time Series: An Introduction , 1981 .

[101]  Philip D. Wasserman,et al.  Advanced methods in neural computing , 1993, VNR computer library.

[102]  Edwina L. Rissland,et al.  Case-Based Diagnostic Analysis in a Blackboard Architecture , 1993, AAAI.

[103]  Thomas G. Dietterich,et al.  A study of distance-based machine learning algorithms , 1994 .

[104]  Terrence J. Sejnowski,et al.  Parallel Networks that Learn to Pronounce English Text , 1987, Complex Syst..

[105]  Irwin Guttman,et al.  Introductory Engineering Statistics , 1965 .

[106]  Dennis L. Wilson,et al.  Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..

[107]  S. Saxena Predicting the effect of instance representations on inductive learning , 1992 .

[108]  Dennis F. Kibler,et al.  Learning Prototypical Concept Descriptions , 1995, ICML.

[109]  Roberto Battiti,et al.  Democracy in neural nets: Voting schemes for classification , 1994, Neural Networks.

[110]  Koton Phyllis,et al.  Using experience in learning and problem solving , 1988 .

[111]  L. Glass,et al.  Oscillation and chaos in physiological control systems. , 1977, Science.

[112]  C. E. Brodley,et al.  Dynamic Automatic Model Selection , 1992 .

[113]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[114]  Dana Angluin,et al.  Computational learning theory: survey and selected bibliography , 1992, STOC '92.

[115]  Paul E. Utgoff,et al.  Perceptron Trees : A Case Study in ybrid Concept epresentations , 1999 .

[116]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[117]  Hugh B. Woodruff,et al.  An algorithm for a selective nearest neighbor decision rule (Corresp.) , 1975, IEEE Trans. Inf. Theory.

[118]  Phyllis Koton,et al.  Using experience in learning and problem solving , 1988 .

[119]  Edwina R Michener,et al.  The Structure of Mathematical Knowledge , 1978 .

[120]  David Haussler,et al.  Occam's Razor , 1987, Inf. Process. Lett..

[121]  M. Pazzani,et al.  Learning probabilistic relational concept descriptions , 1996 .

[122]  Michael I. Jordan,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1994, Neural Computation.

[123]  T. Odlin Women, Fire, and Dangerous Things: What Categories Reveal about the Mind , 1988 .

[124]  Pat Langley,et al.  Induction of Recursive Bayesian Classifiers , 1993, ECML.

[125]  G. Lakoff,et al.  Women, Fire, and Dangerous Things: What Categories Reveal about the Mind , 1988 .

[126]  Rm Cameron-Jones,et al.  Instance Selection by Encoding Length Heuristic with Random Mutation Hill Climbing , 1995 .

[127]  P. Utgoff,et al.  A Kolmogorov-Smirnoff Metric for Decision Tree Induction , 1996 .

[128]  John H. Holland,et al.  When will a Genetic Algorithm Outperform Hill Climbing , 1993, NIPS.

[129]  T. Kohonen,et al.  Statistical pattern recognition with neural networks: benchmarking studies , 1988, IEEE 1988 International Conference on Neural Networks.

[130]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[131]  Michael I. Jordan,et al.  Task Decomposition Through Competition in a Modular Connectionist Architecture: The What and Where Vision Tasks , 1990, Cogn. Sci..

[132]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[133]  Salvatore J. Stolfo,et al.  Experiments on multistrategy learning by meta-learning , 1993, CIKM '93.

[134]  Kevin D. Ashley,et al.  Hypotheticals as Heuristic Device , 1986, HLT.

[135]  A. A. Mullin,et al.  Principles of neurodynamics , 1962 .

[136]  D. Wolpert Combining Generalizers Using Partitions of the Learning Set , 1993 .

[137]  Jean Voisin,et al.  An application of the multiedit-condensing technique to the reference selection problem in a print recognition system , 1987, Pattern Recognit..

[138]  Thomas G. Dietterich,et al.  Improving the Performance of Radial Basis Function Networks by Learning Center Locations , 1991, NIPS.

[139]  O. G. Selfridge,et al.  Pandemonium: a paradigm for learning , 1988 .