Formation of a Compact Reduct Set Based on Discernibility Relation and Attribute Dependency of Rough Set Theory

Large amount of data have been collected routinely in the course of day-to-day work in different fields. Typically, the datasets constantly grow accumulating a large number of features, which are not equally important in decision-making. Moreover, the information often lacks completeness and has relatively low information density. Dimensionality reduction is a fundamental area of research in data mining domain. Rough Set Theory (RST), based on a mathematical concept, has become very popular in dimensionality reduction of large datasets. The method is used to determine a subset of attributes called reduct which can predict the decision concepts. In the paper, the concepts of discernibility relation and attribute dependency are integrated for the formation of a compact reduct set which not only reduces the complexity but also helps to achieve higher accuracy of the system. Performance of the proposed method has been evaluated by comparing classification accuracy with some existing dimension reduction algorithms, demonstrating superior result.

[1]  José Francisco Martínez-Trinidad,et al.  Progress in Pattern Recognition, Image Analysis and Applications, 12th Iberoamericann Congress on Pattern Recognition, CIARP 2007, Valparaiso, Chile, November 13-16, 2007, Proceedings , 2008, CIARP.

[2]  Fabio Roli,et al.  Analysis of error-reject trade-off in linearly combined multiple classifiers , 2004, Pattern Recognit..

[3]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Salim Hariri,et al.  A new dependency and correlation analysis for features , 2005, IEEE Transactions on Knowledge and Data Engineering.

[5]  Sankar K. Pal,et al.  Multilayer perceptron, fuzzy sets, and classification , 1992, IEEE Trans. Neural Networks.

[6]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[7]  Roman W. Świniarski,et al.  Rough sets methods in feature reduction and classification , 2001 .

[8]  A Gordon,et al.  Classification, 2nd Edition , 1999 .

[9]  Ronald L. Rivest,et al.  Inferring Decision Trees Using the Minimum Description Length Principle , 1989, Inf. Comput..

[10]  Pavel Pudil,et al.  Conditional Mutual Information Based Feature Selection for Classification Task , 2007, CIARP.

[11]  Bin Yu,et al.  Model Selection and the Principle of Minimum Description Length , 2001 .

[12]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[13]  Alex A. Freitas,et al.  A Genetic Programming Framework for Two Data Mining Tasks: Classification and Generalized Rule Induction , 1997 .

[14]  Miguel Á. Carreira-Perpiñán,et al.  A Review of Dimension Reduction Techniques , 2009 .

[15]  Anil K. Jain,et al.  Dimensionality reduction using genetic algorithms , 2000, IEEE Trans. Evol. Comput..

[16]  Zdzislaw Pawlak,et al.  Rough Set Theory and its Applications to Data Analysis , 1998, Cybern. Syst..

[17]  Andrzej Skowron,et al.  Rough-Fuzzy Hybridization: A New Trend in Decision Making , 1999 .

[18]  Houkuan Huang,et al.  Dynamic Reduction Based on Rough Sets in Incomplete Decision Systems , 2007, RSKT.

[19]  J. Ross Quinlan,et al.  The Minimum Description Length Principle and Categorical Theories , 1994, ICML.

[20]  Susan A. Murphy,et al.  Monographs on statistics and applied probability , 1990 .