Assessment of data granulations in context of feature extraction problem

In this paper we investigate a method of measuring the quality of a data granulation in a decision system, defined by an indiscernibility relation in a specific type of approximation spaces. In the proposed algorithm, the concept of a random probe is used in order to estimate the probability that a given data granulation is relevant in a classification context. We explain an intuition behind our approach and show how it can be utilized for practical data analysis in tasks such as attribute selection or construction of new attributes. We also inspect relationships between the problem of finding a useful granulation of data and extracting informative features for supervised classification. To avoid low relevance of derived granules, we perform a random probe test to verify their validity. Using this technique we can more objectively assess the usefulness of a given data granulation for solving the classification problem at hand.

[1]  Andrzej Skowron,et al.  Function Approximation and Quality Measures in Rough-Granular Systems , 2011, Fundam. Informaticae.

[2]  Andrzej Skowron,et al.  Calculi of Approximation Spaces , 2006, Fundam. Informaticae.

[3]  M. Tribus,et al.  Probability theory: the logic of science , 2003 .

[4]  Dominik Slezak,et al.  Normalized Decision Functions and Measures for Inconsistent Decision Tables Analysis , 2000, Fundam. Informaticae.

[5]  Hiroshi Motoda,et al.  Feature Extraction, Construction and Selection: A Data Mining Perspective , 1998 .

[6]  Dominik Slezak,et al.  Order Based Genetic Algorithms for the Search of Approximate Entropy Reducts , 2003, RSFDGrC.

[7]  Dominik Slezak,et al.  Random Probes in Computation and Assessment of Approximate Reducts , 2014, RSEISP.

[8]  Sinh Hoa Nguyen,et al.  An Approach to Pattern Recognition Based on Hierarchical Granular Computing , 2013, Fundam. Informaticae.

[9]  Vladik Kreinovich,et al.  Handbook of Granular Computing , 2008 .

[10]  Masoud Nikravesh,et al.  Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing) , 2006 .

[11]  Andrzej Skowron,et al.  Calculi of Approximation Spaces in Intelligent Systems , 2011 .

[12]  Hung Son Nguyen,et al.  On Efficient Handling of Continuous Attributes in Large Data Bases , 2001, Fundam. Informaticae.

[13]  Janusz Zalewski,et al.  Rough sets: Theoretical aspects of reasoning about data , 1996 .

[14]  Andrzej Skowron,et al.  Rudiments of rough sets , 2007, Inf. Sci..

[15]  Dominik Slezak,et al.  Rough Set Methods for Attribute Clustering and Selection , 2014, Appl. Artif. Intell..

[16]  Andrzej Janusz,et al.  Applications of Approximate Reducts to the Feature Selection Problem , 2011, RSKT.

[17]  J. Stepaniuk Rough – Granular Computing in Knowledge Discovery and Data Mining , 2008 .

[18]  Xiang Zhang,et al.  A General Framework for Fast Co-clustering on Large Datasets Using Matrix Decomposition , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[19]  Witold Pedrycz,et al.  The development of granular metastructures and their use in a multifaceted representation of data and models , 2010, Kybernetes.

[20]  Witold Pedrycz,et al.  Building granular fuzzy decision support systems , 2014, Knowl. Based Syst..

[21]  Dominik Slezak,et al.  Approximate Entropy Reducts , 2002, Fundam. Informaticae.

[22]  K. N. King 2006 IEEE International Conference on Granular Computing , 2006, IEEE Comput. Intell. Mag..

[23]  Dominik Ślęzak,et al.  Representation and Evaluation of Granular Systems , 2012 .