论文信息 - Prototype Selection for Composite Nearest Neighbor Classifiers

Prototype Selection for Composite Nearest Neighbor Classifiers

Combining the predictions of a set of classifiers has been shown to be an effective way to create composite classifiers that are more accurate than any of the component classifiers. Increased accuracy has been shown in a variety of real-world applications, ranging from protein sequence identification to determining the fat content of ground meat. Despite such individual successes, the answers are not known to fundamental questions about classifier combination, such as "Can classifiers from any given model class be combined to create a composite classifier with higher accuracy?" or "Is it possible to increase the accuracy of a given classifier by combining its predictions with those of only a small number of other classifiers?". The goal of this dissertation is to provide answers to these and closely related questions with respect to a particular model class, the class of nearest neighbor classifiers. We undertake the first study that investigates in depth the combination of nearest neighbor classifiers. Although previous research has questioned the utility of combining nearest neighbor classifiers, we introduce algorithms that combine a small number of component nearest neighbor classifiers, where each of the components stores a small number of prototypical instances. In a variety of domains, we show that these algorithms yield composite classifiers that are more accurate than a nearest neighbor classifier that stores all training instances as prototypes. The research presented in this dissertation also extends previous work on prototype selection for an independent nearest neighbor classifier. We show that in many domains, storing a very small number of prototypes can provide classification accuracy greater than or equal to that of a nearest neighbor classifier that stores all training instances. We extend previous work by demonstrating that algorithms that rely primarily on random sampling can effectively choose a small number of prototypes.

David B. Skalak | D. B. Skalak

[1] Anders Krogh,et al. Introduction to the theory of neural computation , 1994, The advanced book program.

[2] Yoav Freund,et al. Experiments with a New Boosting Algorithm , 1996, ICML.

[3] Adam Krzyżak,et al. Methods of combining multiple classifiers and their applications to handwriting recognition , 1992, IEEE Trans. Syst. Man Cybern..

[4] F. Girosi,et al. Networks for approximation and learning , 1990, Proc. IEEE.

[5] Tharam S. Dillon,et al. An example of integrating legal case based reasoning with object-oriented rule-based systems: IKBALS II , 1991, ICAIL '91.

[6] Teuvo Kohonen,et al. Self-Organization and Associative Memory, Third Edition , 1989, Springer Series in Information Sciences.

[7] Brian D. Ripley,et al. Pattern Recognition and Neural Networks , 1996 .

[8] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[9] Gerald J. Sussman,et al. Structure and interpretation of computer programs , 1985, Proceedings of the IEEE.

[10] Garrison W. Cottrell,et al. Learning Mackey-Glass from 25 Examples, Plus or Minus 2 , 1993, NIPS.

[11] John B. Shoven,et al. I , Edinburgh Medical and Surgical Journal.

[12] James M. Hutchinson,et al. A radial basis function approach to financial time series analysis , 1993 .

[13] C. W. Swonger. SAMPLE SET CONDENSATION FOR A CONDENSED NEAREST NEIGHBOR DECISION RULE FOR PATTERN RECOGNITION , 1972 .

[14] David H. Wolpert,et al. Stacked generalization , 1992, Neural Networks.

[15] Sebastian Thrun,et al. The MONK''s Problems-A Performance Comparison of Different Learning Algorithms, CMU-CS-91-197, Sch , 1991 .

[16] M. Golea,et al. A Convergence Theorem for Sequential Learning in Two-Layer Perceptrons , 1990 .

[17] R. Clemen. Combining forecasts: A review and annotated bibliography , 1989 .

[18] M. Perrone. Improving regression estimation: Averaging methods for variance reduction with extensions to general convex measure optimization , 1993 .

[19] David W. Aha,et al. Comparing Instance-Averaging with Instance-Filtering Learning Algorithms , 1988, EWSL.

[20] Richard A. Johnson,et al. Applied Multivariate Statistical Analysis , 1983 .

[21] G. G. Stokes. "J." , 1890, The New Yale Book of Quotations.

[22] Belur V. Dasarathy,et al. Nosing Around the Neighborhood: A New System Structure and Classification Rule for Recognition in Partially Exposed Environments , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23] T. M. English,et al. Stacked generalization and simulated evolution. , 1996, Bio Systems.

[24] Corinna Cortes,et al. Boosting Decision Trees , 1995, NIPS.

[25] Paul S. Rosenbloom,et al. Improving Rule-Based Systems Through Case-Based Reasoning , 1991, AAAI.

[26] Jude W. Shavlik,et al. Combining the Predictions of Multiple Classifiers: Using Competitive Learning to Initialize Neural Networks , 1995, IJCAI.

[27] Edward Hirsch Levi,et al. An Introduction to Legal Reasoning , 1950 .

[28] Chin-Liang Chang,et al. Finding Prototypes For Nearest Neighbor Classifiers , 1974, IEEE Transactions on Computers.

[29] Anders Krogh,et al. Neural Network Ensembles, Cross Validation, and Active Learning , 1994, NIPS.

[30] irwin Guttman,et al. Introductory Engineering Statistics , 1965 .

[31] T. Ash,et al. Dynamic node creation in backpropagation networks , 1989, International 1989 Joint Conference on Neural Networks.

[32] Johannes R. Sveinsson,et al. Parallel consensual neural networks , 1997, IEEE Trans. Neural Networks.

[33] John G. Kemeny,et al. The Use of Simplicity in Induction , 1953 .

[34] J. Ross Quinlan,et al. Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[35] E. Rosch,et al. Family resemblances: Studies in the internal structure of categories , 1975, Cognitive Psychology.

[36] J. Ross Quinlan,et al. C4.5: Programs for Machine Learning , 1992 .

[37] H. Sebastian Seung,et al. Query by committee , 1992, COLT '92.

[38] C. G. Hilborn,et al. The Condensed Nearest Neighbor Rule , 1967 .

[39] Salvatore J. Stolfo,et al. Toward parallel and distributed learning by meta-learning , 1993 .

[40] John Moody,et al. Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.

[41] Brian Everitt,et al. Principles of Multivariate Analysis , 2001 .

[42] Thomas H. Wonnacott,et al. Introductory Statistics , 2007, Technometrics.

[43] Nageswara S. V. Rao,et al. N-learners Problem: Fusion Of Concepts , 1992, Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems.

[44] Claire Cardie,et al. Domain-specific knowledge acquisition for conceptual sentence analysis , 1995 .

[45] David B. Skalak,et al. Prototype and Feature Selection by Sampling and Random Mutation Hill Climbing Algorithms , 1994, ICML.

[46] Kenneth Steiglitz,et al. Combinatorial Optimization: Algorithms and Complexity , 1981 .

[47] Harris Drucker,et al. Boosting and Other Ensemble Methods , 1994, Neural Computation.

[48] Sholom M. Weiss,et al. An Empirical Comparison of Pattern Recognition, Neural Nets, and Machine Learning Classification Methods , 1989, IJCAI.

[49] Manfred K. Warmuth,et al. The Weighted Majority Algorithm , 1994, Inf. Comput..

[50] Thomas G. Dietterich,et al. Error-Correcting Output Coding Corrects Bias and Variance , 1995, ICML.

[51] Lars Kai Hansen,et al. Neural Network Ensembles , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[52] Edward E. Smith,et al. Categories and concepts , 1984 .

[53] Ekaterini P. Sycara. Resolving adversarial conflicts: an approach integration case-based and analytic methods , 1987 .

[54] Cullen Schaffer. Cross-Validation, Stacking and Bi-Level Stacking: Meta-Methods for Classification Learning , 1994 .

[55] Glenn Shafer,et al. A Mathematical Theory of Evidence , 2020, A Mathematical Theory of Evidence.