An efficient feature selection algorithm for hybrid data

Feature selection for large-scale data sets has been conceived as a very important data preprocessing step in the area of machine learning. Data sets in real databases usually take on hybrid forms, i.e., the coexistence of categorical and numerical data. In this paper, based on the idea of decomposition and fusion, an efficient feature selection approach for large-scale hybrid data sets is studied. According to this approach, one can get an effective feature subset in a much shorter time. By employing two common classifiers as the evaluation function, experiments have been carried out on twelve UCI data sets. The experimental results show that the proposed approach is effective and efficient.

[1]  Wen-Xiu Zhang,et al.  Knowledge granulation, knowledge entropy and knowledge uncertainty measure in ordered information systems , 2009, Appl. Soft Comput..

[2]  Da Ruan,et al.  An Incremental Approach for Inducing Knowledge from Dynamic Information Systems , 2009, Fundam. Informaticae.

[3]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[4]  Anju Agrawal,et al.  Principles of Statistics and Reporting of Data , 2013 .

[5]  Xu Zhang,et al.  A Quick Attribute Reduction Algorithm with Complexity of max(O(|C||U|),O(|C|~2|U/C|)) , 2006 .

[6]  Qinghua Hu,et al.  Hybrid attribute reduction based on a novel fuzzy-rough model and information granulation , 2007, Pattern Recognit..

[7]  Shigeo Abe,et al.  A novel approach to feature selection based on analysis of class regions , 1997, IEEE Trans. Syst. Man Cybern. Part B.

[8]  Hui Wang,et al.  Nearest neighbors by neighborhood counting , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Andrzej Skowron,et al.  Rough sets and Boolean reasoning , 2007, Inf. Sci..

[10]  Jiye Liang,et al.  Ieee Transactions on Knowledge and Data Engineering 1 a Group Incremental Approach to Feature Selection Applying Rough Set Technique , 2022 .

[11]  Yiyu Yao,et al.  Neighborhood systems and approximate retrieval , 2006, Inf. Sci..

[12]  Hiroshi Motoda,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998, The Springer International Series in Engineering and Computer Science.

[13]  Hong Shen,et al.  Incremental feature selection based on rough set in dynamic incomplete data , 2014, Pattern Recognit..

[14]  Jyoti Rough Set Theory and Its Applications , 2013 .

[15]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[16]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[17]  Qinghua Hu,et al.  Neighborhood rough set based heterogeneous feature subset selection , 2008, Inf. Sci..

[18]  Larry A. Rendell,et al.  The Feature Selection Problem: Traditional Methods and a New Algorithm , 1992, AAAI.

[19]  Jiye Liang,et al.  Determining the number of clusters using information entropy for mixed data , 2012, Pattern Recognit..

[20]  Marzena Kryszkiewicz,et al.  FUN: Fast Discovery of Minimal Sets of Attributes Functionally Determining a Decision Attribute , 2008, Trans. Rough Sets.

[21]  M. G. Bulmer,et al.  Principles of Statistics. , 1969 .

[22]  XIAOHUA Hu,et al.  LEARNING IN RELATIONAL DATABASES: A ROUGH SET APPROACH , 1995, Comput. Intell..

[23]  Cungen Cao,et al.  Some issues about outlier detection in rough set theory , 2009, Expert Syst. Appl..

[24]  Mark A. Hall,et al.  Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning , 1999, ICML.

[25]  Jiye Liang,et al.  International Journal of Approximate Reasoning an Efficient Rough Feature Selection Algorithm with a Multi-granulation View , 2022 .

[26]  Khalid Benabdeslem,et al.  Efficient Semi-Supervised Feature Selection: Constraint, Relevance, and Redundancy , 2014, IEEE Transactions on Knowledge and Data Engineering.

[27]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[28]  Xu Weihua,et al.  Knowledge granulation, knowledge entropy and knowledge uncertainty measure in ordered information systems , 2009 .

[29]  Janusz Zalewski,et al.  Rough sets: Theoretical aspects of reasoning about data , 1996 .

[30]  Jiye Liang,et al.  Attribute reduction for dynamic data sets , 2013, Appl. Soft Comput..

[31]  Gary Geunbae Lee,et al.  Information gain and divergence-based feature selection for machine learning-based text categorization , 2006, Inf. Process. Manag..

[32]  JensenRichard,et al.  Semantics-Preserving Dimensionality Reduction , 2004 .

[33]  Qiang Shen,et al.  Semantics-preserving dimensionality reduction: rough and fuzzy-rough-based approaches , 2004, IEEE Transactions on Knowledge and Data Engineering.

[34]  Manoranjan Dash,et al.  Dimensionality reduction of unsupervised data , 1997, Proceedings Ninth IEEE International Conference on Tools with Artificial Intelligence.

[35]  Ming-Wen Shao,et al.  Dominance relation and rules in an incomplete ordered information system , 2005 .

[36]  Zhongzhi Shi,et al.  A fast approach to attribute reduction in incomplete decision systems with tolerance relation-based rough sets , 2009, Inf. Sci..

[37]  Huan Liu,et al.  Consistency-based search in feature selection , 2003, Artif. Intell..

[38]  Kezhi Mao,et al.  Feature selection algorithm for mixed data with both nominal and continuous features , 2007, Pattern Recognit. Lett..

[39]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[40]  David A. Bell,et al.  Axiomatic Approach to Feature Subset Selection Based on Relevance , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[41]  James Bailey,et al.  Comments on supervised feature selection by clustering using conditional mutual information-based distances , 2013, Pattern Recognit..

[42]  Andrzej Skowron,et al.  Rough set methods in feature selection and recognition , 2003, Pattern Recognit. Lett..

[43]  Witold Pedrycz,et al.  Positive approximation: An accelerator for attribute reduction in rough set theory , 2010, Artif. Intell..

[44]  Thomas G. Dietterich,et al.  Learning Boolean Concepts in the Presence of Many Irrelevant Features , 1994, Artif. Intell..

[45]  Huan Liu,et al.  Discretization: An Enabling Technique , 2002, Data Mining and Knowledge Discovery.

[46]  Van-Nam Huynh,et al.  A roughness measure for fuzzy sets , 2005, Inf. Sci..

[47]  Chong-Ho Choi,et al.  Input Feature Selection by Mutual Information Based on Parzen Window , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[48]  Jiye Liang,et al.  Attribute reduction: A dimension incremental strategy , 2013, Knowl. Based Syst..

[49]  Yiyu Yao,et al.  Attribute reduction in decision-theoretic rough set models , 2008, Inf. Sci..

[50]  Wei-Zhi Wu,et al.  Generalized fuzzy rough sets , 2003, Inf. Sci..

[51]  Qiang Shen,et al.  Computational Intelligence and Feature Selection - Rough and Fuzzy Approaches , 2008, IEEE Press series on computational intelligence.

[52]  Geert Wets,et al.  A rough sets based characteristic relation approach for dynamic attribute generalization in data mining , 2007, Knowl. Based Syst..

[53]  Qinghua Hu,et al.  Information-preserving hybrid data reduction based on fuzzy-rough techniques , 2006, Pattern Recognit. Lett..

[54]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55]  Wei Wei,et al.  An attribute reduction approach and its accelerated version for hybrid data , 2009, 2009 8th IEEE International Conference on Cognitive Informatics.

[56]  Jianzhong Wang,et al.  Maximum weight and minimum redundancy: A novel framework for feature subset selection , 2013, Pattern Recognit..

[57]  Witold Pedrycz,et al.  Feature analysis through information granulation and fuzzy sets , 2002, Pattern Recognit..