Learning good prototypes for classification using filtering and abstraction of instances

We propose a framework for learning good prototypes, called prototype generation and filtering (PGF), by integrating the strength of instance-filtering and instance-abstraction techniques using two different integration methods. The two integration methods differ in the filtering granularity as well as the degree of coupling of the techniques. In order to characterize the behavior of the effect of integration, we categorize instance-filtering techniques into three kinds, namely, (1) removing border instances, (2) retaining border instance, (3) retaining center instances. The effect of using different kinds of filtering in different variants of our PGF framework are investigated. We have conducted experiments on 35 real-world benchmark data sets. We found that our PGF framework maintains or achieves better classification accuracy and gains a significant improvement in data reduction compared with pure filtering and pure abstraction techniques as well as KNN and C4.5.

[1]  Gary L. Bradshaw,et al.  Learning about speech sounds: The NEXUS Project , 1987 .

[2]  David W. Aha,et al.  Comparing Instance-Averaging with Instance-Filtering Learning Algorithms , 1988, EWSL.

[3]  Tony R. Martinez,et al.  An Integrated Instance‐Based Learning Algorithm , 2000, Comput. Intell..

[4]  Theodore Johnson,et al.  Squashing flat files flatter , 1999, KDD '99.

[5]  S. Salzberg,et al.  A weighted nearest neighbor algorithm for learning with symbolic features , 2004, Machine Learning.

[6]  Tony R. Martinez,et al.  Instance Pruning Techniques , 1997, ICML.

[7]  James C. Bezdek,et al.  Multiple-prototype classifier design , 1998, IEEE Trans. Syst. Man Cybern. Part C.

[8]  G. Gates,et al.  The reduced nearest neighbor rule (Corresp.) , 1972, IEEE Trans. Inf. Theory.

[9]  Belur V. Dasarathy,et al.  Nearest neighbor (NN) norms: NN pattern classification techniques , 1991 .

[10]  Dietrich Wettschereck,et al.  A Hybrid Nearest-Neighbor and Nearest-Hyperrectangle Algorithm , 1994, ECML.

[11]  Pedro M. Domingos,et al.  Unifying Instance-Based and Rule-Based Induction , 1996 .

[12]  Chin-Liang Chang,et al.  Finding Prototypes For Nearest Neighbor Classifiers , 1974, IEEE Transactions on Computers.

[13]  Oded Maron,et al.  Multiple-Instance Learning for Natural Scene Classification , 1998, ICML.

[14]  Pedro M. Domingos Unifying Instance-Based and Rule-Based Induction , 1996, Machine Learning.

[15]  Antal van den Bosch Instance-Family Abstraction in Memory-Based Language Learning , 1999, ICML.

[16]  Dennis F. Kibler,et al.  Symbolic Nearest Mean Classifiers , 1997, AAAI/IAAI.

[17]  Jianping Zhang,et al.  Selecting Typical Instances in Instance-Based Learning , 1992, ML.

[18]  G. Gates The Reduced Nearest Neighbor Rule , 1998 .

[19]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[20]  David W. Aha,et al.  Comparing Instance-Averaging with Instance-Saving Learning Algorithms , 1990 .

[21]  Wai Lam,et al.  Prototype Generation Based on Instance Filtering and Averaging , 2000, PAKDD.

[22]  David W. Aha,et al.  Noise-Tolerant Instance-Based Learning Algorithms , 1989, IJCAI.

[23]  C. G. Hilborn,et al.  The Condensed Nearest Neighbor Rule , 1967 .

[24]  David L. Waltz,et al.  Toward memory-based reasoning , 1986, CACM.

[25]  Peter E. Hart,et al.  The condensed nearest neighbor rule (Corresp.) , 1968, IEEE Trans. Inf. Theory.

[26]  Tony R. Martinez,et al.  Reduction Techniques for Instance-Based Learning Algorithms , 2000, Machine Learning.

[27]  Dennis L. Wilson,et al.  Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..

[28]  Dennis F. Kibler,et al.  Learning Prototypical Concept Descriptions , 1995, ICML.