Rough set theory in very large databases

Rough set theory is an elegant and powerful methodology in extracting and minimizing rules from decision tables and Pawlak information systems. Its central notions are core, reduct, and knowledge dependency. It has been shown that finding the minimal reduct is an NP-hard problem, so its computational complexity have implicitly restricted its effective applications to a small and clean data set. In this paper, rough set methodology is extended to very large relational databases with some sacrificing on its elegancy. In essence techniques in extracting nice subsets of data from noisy data banks is integrated with rough set theory. Given a database, a sequence of various sizes of inter connected Pawlak information systems (PIS) are extracted from very large data banks. These PIS represent certain patters of data banks. Applying rough set methodology to these PIS's, soft rules can be effectively mined. However, these rules may not be the minimal reduct.

[1]  Z. Pawlak Rough Sets: Theoretical Aspects of Reasoning about Data , 1991 .

[2]  Andrzej Skowron,et al.  The Discernibility Matrices and Functions in Information Systems , 1992, Intelligent Decision Support.

[3]  M.A.W. Houtsma,et al.  Set-Oriented Mining for Association Rules , 1993, ICDE 1993.

[4]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[5]  Tomasz Imielinski,et al.  Database Mining: A Performance Perspective , 1993, IEEE Trans. Knowl. Data Eng..

[6]  Wojciech Ziarko,et al.  Variable Precision Rough Set Model , 1993, J. Comput. Syst. Sci..

[7]  Jerzy W. Grzymala-Busse,et al.  Rough Sets , 1995, Commun. ACM.

[8]  Arun N. Swami,et al.  Set-oriented mining for association rules in relational databases , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[9]  Yiyu Yao,et al.  Generalization of Rough Sets using Modal Logics , 1996, Intell. Autom. Soft Comput..

[10]  Janusz Zalewski,et al.  Rough sets: Theoretical aspects of reasoning about data , 1996 .