论文信息 - PUB: A Class Description Technique Based on Partial Coverage of Subspace

PUB: A Class Description Technique Based on Partial Coverage of Subspace

A good description of a class should be accurate and interpretable. Previous works describe classes either by analyzing the correlation of each attribute with the class, or by producing rules as in building a classifier. These solutions suffer from issues in accuracy and interpretability. A description naturally consists of sentences, where each sentence consists of a set of terms. Normally, a sentence is defined as a disjunction or conjunction of several terms, each of which specifies a constraint (range/set of values) on an attribute. From the data analysis point of view, a sentence specifies a subspace in the database. In this paper, we create a richer yet interpretable form of a sentence, i.e., a sentence describes an object if any $k$ attributes of that object satisfy the specified constraints. To that end, we design \textsc{Pub}, an algorithm that produces descriptions with our form of sentences. While constructing a sentence (within the description), \textsc{Pub} finds the optimal range/set of values for each attribute in linear time. We also empirically show that \textsc{Pub} is efficient, and able to produce more accurate, concise and interpretable descriptions than current approaches on various real datasets.

Vivekanand Gopalkrishnan | Ardian Kristanto Poernomo | V. Gopalkrishnan

[1] Luc De Raedt,et al. Correlated itemset mining in ROC space: a constraint programming approach , 2009, KDD.

[2] Cheng Yang,et al. Efficient discovery of error-tolerant frequent itemsets in high dimensions , 2001, KDD '01.

[3] Yasuhiko Morimoto,et al. Data Mining with optimized two-dimensional association rules , 2001, TODS.

[4] Anthony K. H. Tung,et al. Fault-Tolerant Frequent Pattern Mining: Problems and Challenges , 2001, DMKD.

[5] Jiawei Han,et al. Generalization-Based Data Mining in Object-Oriented Databases Using an Object Cube Model , 1998, Data Knowl. Eng..

[6] Dimitrios Gunopulos,et al. Automatic Subspace Clustering of High Dimensional Data , 2005, Data Mining and Knowledge Discovery.

[7] Aiko M. Hormann,et al. Programs for Machine Learning. Part I , 1962, Inf. Control..

[8] R. Suganya,et al. Data Mining Concepts and Techniques , 2010 .

[9] Peter L. Brooks,et al. Visualizing data , 1997 .

[10] Wynne Hsu,et al. Integrating Classification and Association Rule Mining , 1998, KDD.

[11] Hiroshi Motoda,et al. Feature Selection for Knowledge Discovery and Data Mining , 1998, The Springer International Series in Engineering and Computer Science.

[12] E. F. Codd,et al. Providing OLAP to User-Analysts: An IT Mandate , 1998 .

[13] J. Ross Quinlan,et al. C4.5: Programs for Machine Learning , 1992 .

[14] Nimrod Megiddo,et al. Discovery-Driven Exploration of OLAP Data Cubes , 1998, EDBT.

[15] Jiawei Han,et al. Attribute-Oriented Induction in Relational Databases , 1991, Knowledge Discovery in Databases.

[16] Vipin Kumar,et al. Quantitative evaluation of approximate frequent pattern mining algorithms , 2008, KDD.