Feature grouping-based fuzzy-rough feature selection

Data dimensionality has become a pervasive problem in many areas that require the learning of interpretable models. This has become particularly pronounced in recent years with the seemingly relentless growth in the size of datasets. Indeed, as the number of dimensions increases, the number of data instances required in order to generate accurate models increases exponentially. Feature selection has therefore become not only a useful step in the process of model learning, but rather an increasingly necessary one. Rough set and fuzzy-rough set theory have been used as such dataset pre-processors with much success, however the underlying time/space complexity of the subset evaluation metric is an obstacle to the processing of very large data. This paper proposes a general approach to this problem that employs a novel feature grouping step in order to alleviate the processing overhead for large datasets. The approach is framed within the context of (and applied to) fuzzy-rough sets, although it can be used with other subset evaluation techniques. The experimental evaluation demonstrates that considerable computational effort can be avoided, and as a result efficiency can be improved considerably for larger datasets.

[1]  Robert Marti,et al.  A Novel Breast Tissue Density Classification Methodology , 2008, IEEE Transactions on Information Technology in Biomedicine.

[2]  Janusz Zalewski,et al.  Rough sets: Theoretical aspects of reasoning about data , 1996 .

[3]  Chris Cornelis,et al.  Attribute selection with fuzzy decision reducts , 2010, Inf. Sci..

[4]  Qiang Shen,et al.  Feature Selection With Harmony Search , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[5]  Anna Maria Radzikowska,et al.  A comparative study of fuzzy rough sets , 2002, Fuzzy Sets Syst..

[6]  Qiang Shen,et al.  New Approaches to Fuzzy-Rough Feature Selection , 2009, IEEE Transactions on Fuzzy Systems.

[7]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[8]  Sebastian Widz,et al.  Decision bireducts and approximate decision reducts: Comparison of two approaches to attribute subset ensemble construction , 2012, 2012 Federated Conference on Computer Science and Information Systems (FedCSIS).

[9]  Ian H. Witten,et al.  Generating Accurate Rule Sets Without Global Optimization , 1998, ICML.

[10]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[11]  Richard H. Moore,et al.  THE DIGITAL DATABASE FOR SCREENING MAMMOGRAPHY , 2007 .

[12]  L. Polkowski Rough Sets: Mathematical Foundations , 2013 .

[13]  Didier Dubois,et al.  Putting Rough Sets and Fuzzy Sets Together , 1992, Intelligent Decision Support.

[14]  Qiang Shen,et al.  Finding rough and fuzzy-rough set reducts with SAT , 2014, Inf. Sci..

[15]  Dominik Slezak,et al.  Utilization of attribute clustering methods for scalable computation of reducts from high-dimensional data , 2012, 2012 Federated Conference on Computer Science and Information Systems (FedCSIS).

[16]  Qinghua Hu,et al.  Information-preserving hybrid data reduction based on fuzzy-rough techniques , 2006, Pattern Recognit. Lett..

[17]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[18]  Ian W. Ricketts,et al.  The Mammographic Image Analysis Society digital mammogram database , 1994 .