Local rough set: A solution to rough data analysis in big data

Abstract As a supervised learning method, classical rough set theory often requires a large amount of labeled data, in which concept approximation and attribute reduction are two key issues. With the advent of the age of big data however, labeling data is an expensive and laborious task and sometimes even infeasible, while unlabeled data are cheap and easy to collect. Hence, techniques for rough data analysis in big data using a semi-supervised approach, with limited labeled data, are desirable. Although many concept approximation and attribute reduction algorithms have been proposed in the classical rough set theory, quite often, these methods are unable to work well in the context of limited labeled big data. The challenges to classical rough set theory can be summarized with three issues: limited labeled property of big data, computational inefficiency and over-fitting in attribute reduction. To address these three challenges, we introduce a theoretic framework called local rough set, and develop a series of corresponding concept approximation and attribute reduction algorithms with linear time complexity, which can efficiently and effectively work in limited labeled big data. Theoretical analysis and experimental results show that each of the algorithms in the local rough set significantly outperforms its original counterpart in classical rough set theory. It is worth noting that the performances of the algorithms in the local rough set become more significant when dealing with larger data sets.

[1]  Qinghua Hu,et al.  Set-based granular computing: A lattice model , 2014, Int. J. Approx. Reason..

[2]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[3]  Andrzej Skowron,et al.  Rough mereology: A new paradigm for approximate reasoning , 1996, Int. J. Approx. Reason..

[4]  Leon Sterling,et al.  Adding monotonicity to learning algorithms may impair their accuracy , 2009, Expert Syst. Appl..

[5]  Witold Pedrycz,et al.  Feature analysis through information granulation and fuzzy sets , 2002, Pattern Recognit..

[6]  Wen-Xiu Zhang,et al.  Theory of including degrees and its applications to uncertainty inferences , 1996, Soft Computing in Intelligent Systems and Information Processing. Proceedings of the 1996 Asian Fuzzy Systems Symposium.

[7]  Yiyu Yao,et al.  Probabilistic approaches to rough sets , 2003, Expert Syst. J. Knowl. Eng..

[8]  Qinghua Hu,et al.  Neighborhood rough set based heterogeneous feature subset selection , 2008, Inf. Sci..

[9]  Rajen B. Bhatt,et al.  On fuzzy-rough sets approach to feature selection , 2005, Pattern Recognit. Lett..

[10]  Ivo Düntsch,et al.  Rough approximation quality revisited , 2001, Artif. Intell..

[11]  Qiang Shen,et al.  Computational Intelligence and Feature Selection - Rough and Fuzzy Approaches , 2008, IEEE Press series on computational intelligence.

[12]  Xiaodong Yue,et al.  Tri-partition neighborhood covering reduction for robust classification , 2017, Int. J. Approx. Reason..

[13]  Yiyu Yao,et al.  MGRS: A multi-granulation rough set , 2010, Inf. Sci..

[14]  Duoqian Miao,et al.  Two basic double-quantitative rough set models of precision and grade and their investigation using granular computing , 2013, Int. J. Approx. Reason..

[15]  Jerzy W. Grzymala-Busse,et al.  LERS-A System for Learning from Examples Based on Rough Sets , 1992, Intelligent Decision Support.

[16]  Andrzej Skowron,et al.  Tolerance Approximation Spaces , 1996, Fundam. Informaticae.

[17]  Wang Guo,et al.  Decision Table Reduction based on Conditional Information Entropy , 2002 .

[18]  Andrzej Skowron,et al.  Rough set methods in feature selection and recognition , 2003, Pattern Recognit. Lett..

[19]  Xia Xiao,et al.  Three-way group decision making based on multigranulation fuzzy decision-theoretic rough set over two universes , 2017, Int. J. Approx. Reason..

[20]  Theresa Beaubouef,et al.  Rough Sets , 2019, Lecture Notes in Computer Science.

[21]  Yiyu Yao,et al.  Three-way decisions with probabilistic rough sets , 2010, Inf. Sci..

[22]  Dominik Slezak,et al.  The investigation of the Bayesian rough set model , 2005, Int. J. Approx. Reason..

[23]  Yiyu Yao,et al.  Probabilistic rough set approximations , 2008, Int. J. Approx. Reason..

[24]  Zhifei Zhang,et al.  A three-way decisions model with probabilistic rough sets for stream computing , 2017, Int. J. Approx. Reason..

[25]  Mohammad Masoud Javidi,et al.  Online streaming feature selection using rough sets , 2016, Int. J. Approx. Reason..

[26]  Decui Liang,et al.  Three-way group decisions with decision-theoretic rough sets , 2016, Inf. Sci..

[27]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[28]  Marzena Kryszkiewicz,et al.  Rough Set Approach to Incomplete Information Systems , 1998, Inf. Sci..

[29]  Witold Pedrycz,et al.  Grouping granular structures in human granulation intelligence , 2017, Inf. Sci..

[30]  Witold Pedrycz,et al.  An efficient accelerator for attribute reduction from incomplete data in rough set framework , 2011, Pattern Recognit..

[31]  XIAOHUA Hu,et al.  LEARNING IN RELATIONAL DATABASES: A ROUGH SET APPROACH , 1995, Comput. Intell..

[32]  Guoyin Wang,et al.  A Comparative Study of Algebra Viewpoint and Information Viewpoint in Attribute Reduction , 2005, Fundam. Informaticae.

[33]  Tsau Young Lin,et al.  Data Mining and Machine Oriented Modeling: A Granular Computing Approach , 2000, Applied Intelligence.

[34]  D. Dubois,et al.  ROUGH FUZZY SETS AND FUZZY ROUGH SETS , 1990 .

[35]  Wojciech Ziarko,et al.  Variable Precision Rough Set Model , 1993, J. Comput. Syst. Sci..

[36]  Marzena Kryszkiewicz,et al.  Rules in Incomplete Information Systems , 1999, Inf. Sci..

[37]  Andrzej Skowron,et al.  EXTRACTING LAWS FROM DECISION TABLES: A ROUGH SET APPROACH , 1995, Comput. Intell..

[38]  Xu Zhang,et al.  A Quick Attribute Reduction Algorithm with Complexity of max(O(|C||U|),O(|C|~2|U/C|)) , 2006 .

[39]  Salvatore Greco,et al.  Rough approximation of a preference relation by dominance relations , 1999, Eur. J. Oper. Res..

[40]  Andrzej Bargiela,et al.  Granular clustering: a granular signature of data , 2002, IEEE Trans. Syst. Man Cybern. Part B.

[41]  Huan Liu,et al.  Feature Selection via Discretization , 1997, IEEE Trans. Knowl. Data Eng..

[42]  Witold Pedrycz,et al.  Granular Computing: Analysis and Design of Intelligent Systems , 2013 .

[43]  Jiye Liang,et al.  Incomplete Multigranulation Rough Set , 2010, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[44]  Yanhong She,et al.  On the structure of the multigranulation rough set model , 2012, Knowl. Based Syst..

[45]  Jerzy W. Grzymala-Busse,et al.  Rough Sets , 1995, Commun. ACM.

[46]  Yiyu Yao,et al.  The superiority of three-way decisions in probabilistic rough set models , 2011, Inf. Sci..

[47]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[48]  T. Pavlenko On feature selection, curse-of-dimensionality and error probability in discriminant analysis , 2003 .

[49]  Decui Liang,et al.  Incorporating logistic regression to decision-theoretic rough sets for classifications , 2014, Int. J. Approx. Reason..

[50]  Ivo Diintsch Uncertainty measures of rough set prediction , 2003 .

[51]  Qinghua Hu,et al.  Hybrid attribute reduction based on a novel fuzzy-rough model and information granulation , 2007, Pattern Recognit..

[52]  Witold Pedrycz,et al.  Positive approximation: An accelerator for attribute reduction in rough set theory , 2010, Artif. Intell..

[53]  Wei-Zhi Wu,et al.  Knowledge reduction in random information systems via Dempster-Shafer theory of evidence , 2005, Inf. Sci..

[54]  Jiye Liang,et al.  Pessimistic rough set based decisions: A multigranulation fusion strategy , 2014, Inf. Sci..

[55]  Jiye Liang,et al.  International Journal of Approximate Reasoning an Efficient Rough Feature Selection Algorithm with a Multi-granulation View , 2022 .

[56]  Jiye Liang,et al.  Ieee Transactions on Knowledge and Data Engineering 1 a Group Incremental Approach to Feature Selection Applying Rough Set Technique , 2022 .