Tree structure for efficient data mining using rough sets

In data mining, an important goal is to generate an abstraction of the data. Such an abstraction helps in reducing the space and search time requirements of the overall decision making process. Further, it is important that the abstraction is generated from the data with a small number of disk scans. We propose a novel data structure, pattern count tree (PC-tree), that can be built by scanning the database only once. PC-tree is a minimal size complete representation of the data and it can be used to represent dynamic databases with the help of knowledge that is either static or changing. We show that further compactness can be achieved by constructing the PC-tree on segmented patterns. We exploit the flexibility offered by rough sets to realize a rough PC-tree and use it for efficient and effective rough classification. To be consistent with the sizes of the branches of the PC-tree, we use upper and lower approximations of feature sets in a manner different from the conventional rough set theory. We conducted experiments using the proposed classification scheme on a large-scale hand-written digit data set. We use the experimental results to establish the efficacy of the proposed approach.

[1]  M. Schader,et al.  New Approaches in Classification and Data Analysis , 1994 .

[2]  Jerzy W. Grzymala-Busse,et al.  Rough Sets , 1995, Commun. ACM.

[3]  William Frawley,et al.  Knowledge Discovery in Databases , 1991 .

[4]  M. Narasimha Murty,et al.  Handwritten Digit Recognition Using Soft Computing Tools , 2000 .

[5]  Roman Słowiński,et al.  Rough Classification with Valued Closeness Relation , 1994 .

[6]  T. Ravindra Babu,et al.  Comparison of genetic algorithm based prototype selection schemes , 2001, Pattern Recognit..

[7]  M. Narasimha Murty,et al.  Growing subspace pattern recognition methods and their neural-network models , 1997, IEEE Trans. Neural Networks.

[8]  Farokh B. Bastani Editor-in-Chief Prefaces Special-Edition Tribute , 1999, IEEE Trans. Knowl. Data Eng..

[9]  M. Narasimha Murty,et al.  Scalable, Distributed and Dynamic Mining of Association Rules , 2000, HiPC.

[10]  Sanjay Ranka,et al.  An Efficient Algorithm for the Incremental Updation of Association Rules in Large Databases , 1997, KDD.

[11]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[12]  Jerzy W. Grzymala-Busse,et al.  Rough sets : New horizons in commercial and industrial AI , 1995 .

[13]  M. Narasimha Murty,et al.  Rule prepending and post-pruning approach to incremental learning of decision lists , 2001, Pattern Recognit..

[14]  Wojciech Ziarko,et al.  The Discovery, Analysis, and Representation of Data Dependencies in Databases , 1991, Knowledge Discovery in Databases.

[15]  Sankar K. Pal,et al.  Soft Computing for Image Processing , 2000 .

[16]  Wojciech Ziarko,et al.  Variable Precision Rough Sets with Asymmetric Bounds , 1993, RSKD.

[17]  Jiawei Han,et al.  Mining Multiple-Level Association Rules in Large Databases , 1999, IEEE Trans. Knowl. Data Eng..