Customized classification learning based on query projections

We develop a customized classification learning method QPL, which is based on query projections. Given an instance to be classified (query instance), QPL explores the projections of the query instance (QPs), which are essentially subsets of attribute values shared by the query and training instances. QPL investigates the associated training data distribution of a QP to decide whether it is useful. The final prediction for the query is made by combining some statistics of the selected useful QPs. Unlike existing instance-based learning, QPL does not need to compute a distance measure between instances. The utilization of QPs for learning can explore a richer hypothesis space and achieve a balance between precision and robustness. Another characteristic of QPL is that the target class may vary for different query instances in a given data set. We have evaluated our method with synthetic and benchmark data sets. The results demonstrate that QPL can achieve good performance and high reliability.

[1]  Ron Kohavi,et al.  Lazy Decision Trees , 1996, AAAI/IAAI, Vol. 1.

[2]  Wai Lam,et al.  Automatic Textual Document Categorization Based on Generalized Instance Sets and a Metamodel , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[4]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[5]  M. Hagberg Editorial , 2004 .

[6]  Jian Pei,et al.  CMAR: accurate and efficient classification based on multiple class-association rules , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[7]  Wai Lam,et al.  Lazy Learning by Scanning Memory Image Lattice , 2004, SDM.

[8]  Jorma Rissanen,et al.  MDL-Based Decision Tree Pruning , 1995, KDD.

[9]  Geoffrey I. Webb,et al.  Lazy Bayesian Rules: A Lazy Semi-Naive Bayesian Learning Technique Competitive to Boosting Decision Trees , 1999, ICML.

[10]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[11]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[12]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[13]  Dominic Mazzoni,et al.  Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors , 2003, ICML.

[14]  Edward Y. Chang,et al.  MEGA---the maximizing expected generalization algorithm for learning complex query concepts , 2003, TOIS.

[15]  Keki B. Irani,et al.  Multi-interval discretization of continuos attributes as pre-processing for classi cation learning , 1993, IJCAI 1993.

[16]  Carla E. Brodley,et al.  An Incremental Method for Finding Multivariate Splits for Decision Trees , 1990, ML.

[17]  Carla E. Brodley,et al.  Boosting Lazy Decision Trees , 2003, ICML.

[18]  Belur V. Dasarathy,et al.  Minimal consistent set (MCS) identification for optimal nearest neighbor decision systems design , 1994, IEEE Trans. Syst. Man Cybern..

[19]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[20]  Wai Lam,et al.  Lazy Learning for Classification Based on Query Projections , 2005, SDM.

[21]  Kotagiri Ramamohanarao,et al.  DeEPs: A New Instance-Based Lazy Discovery and Classification System , 2004, Machine Learning.

[22]  North Hanover StreetGlasgow Relating Relational Learning Algorithms , 1992 .

[23]  Belur V. Dasarathy,et al.  Nearest neighbor (NN) norms: NN pattern classification techniques , 1991 .

[24]  Geoffrey I. Webb,et al.  Lazy Learning of Bayesian Rules , 2000, Machine Learning.

[25]  Jerome H. Friedman,et al.  Flexible Metric Nearest Neighbor Classification , 1994 .