Data Reduction via Instance Selection

Selection pressures are pervasive. As data grows, the demand for data reduction increases for effective data mining. Instance selection is one of effective means to data reduction. This chapter expounds basic concepts of instance selection, its context, necessity and functionality. It briefly introduces the state-of-the-art methods for instance selection, and presents an overview of the field as well as a summary of contributing chapters in this collection. Its coverage also includes evaluation issues, related work, and future directions.

[1]  Ryszard S. Michalski,et al.  On the Selection of Representative Samples from Large Relational Tables for Inductive Inference , 1975 .

[2]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[3]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[4]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[5]  Theodore Johnson,et al.  Squashing flat files flatter , 1999, KDD '99.

[6]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[7]  Hiroshi Motoda,et al.  Feature Extraction, Construction and Selection , 1998 .

[8]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[9]  Hiroshi Motoda,et al.  Feature Extraction, Construction and Selection: A Data Mining Perspective , 1998 .

[10]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[11]  Paul S. Bradley,et al.  Scaling Clustering Algorithms to Large Databases , 1998, KDD.

[12]  Chin-Liang Chang,et al.  Finding Prototypes For Nearest Neighbor Classifiers , 1974, IEEE Transactions on Computers.

[13]  J. Morse Drowning in Data , 1993 .

[14]  Huan Liu,et al.  A study of support vectors on model independent example selection , 1999, KDD '99.

[15]  Huan Liu,et al.  Handling concept drifts in incremental learning with support vector machines , 1999, KDD '99.

[16]  H. Sebastian Seung,et al.  Query by committee , 1992, COLT '92.