The Inverse Classification Problem

In this paper, we examine an emerging variation of the classification problem, which is known as the inverse classification problem. In this problem, we determine the features to be used to create a record which will result in a desired class label. Such an approach is useful in applications in which it is an objective to determine a set of actions to be taken in order to guide the data mining application towards a desired solution. This system can be used for a variety of decision support applications which have pre-determined task criteria. We will show that the inverse classification problem is a powerful and general model which encompasses a number of different criteria. We propose a number of algorithms for the inverse classification problem, which use an inverted list representation for intermediate data structure representation and classification. We validate our approach over a number of real datasets.

[1]  David G. Stork,et al.  Pattern Classification , 1973 .

[2]  Sanjay Ranka,et al.  CLOUDS: A Decision Tree Classifier for Large Datasets , 1998, KDD.

[3]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[4]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[5]  RamakrishnanRaghu,et al.  BOAToptimistic decision tree construction , 1999 .

[6]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[7]  Johannes Gehrke,et al.  BOAT—optimistic decision tree construction , 1999, SIGMOD '99.

[8]  Alexander Dockhorn,et al.  Classification Algorithms , 2017, Encyclopedia of Machine Learning and Data Mining.

[9]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[10]  Philip S. Yu,et al.  A framework for on-demand classification of evolving data streams , 2006, IEEE Transactions on Knowledge and Data Engineering.

[11]  Jerome H. Friedman,et al.  A Recursive Partitioning Decision Rule for Nonparametric Classification , 1977, IEEE Transactions on Computers.

[12]  Yufei Tao,et al.  Reverse Nearest Neighbor Search in Metric Spaces , 2006, IEEE Transactions on Knowledge and Data Engineering.

[13]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[14]  Carla E. Brodley,et al.  Multivariate decision trees , 2004, Machine Learning.

[15]  Ron Kohavi,et al.  The Power of Decision Tables , 1995, ECML.

[16]  David W. Aha,et al.  Simplifying decision trees: A survey , 1997, The Knowledge Engineering Review.

[17]  Elke Achtert,et al.  Reverse k-nearest neighbor search in dynamic and general metric databases , 2009, EDBT '09.

[18]  Alexander G. Gray,et al.  Retrofitting Decision Tree Classifiers Using Kernel Density Estimation , 1995, ICML.

[19]  J. Ross Quinlan,et al.  Simplifying Decision Trees , 1987, Int. J. Man Mach. Stud..

[20]  Chuan Yi Tang,et al.  A 2.|E|-Bit Distributed Algorithm for the Directed Euler Trail Problem , 1993, Inf. Process. Lett..

[21]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[22]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[23]  David G. Stork,et al.  Pattern classification, 2nd Edition , 2000 .

[24]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[25]  J. Ross Quinlan,et al.  Simplifying decision trees , 1987, Int. J. Hum. Comput. Stud..