A rough set approach for approximating differential dependencies

Abstract Data dependencies in databases and attribute dependencies in decision systems are important when addressing problems concerning data quality and attribute reduction, in which measures play a significant role in approximating these dependencies to achieve better adaptation to uncertain data. This paper proposes a differential-relation-based rough set model from the perspective of relational databases to express the dependency degree, error measures, confidence, information granulation and differential class distance for differential dependencies (DDs) and the relationships among them in a unified framework. Moreover, the error measure g3 has been widely studied and applied for data dependencies. However, the computation of g3 for DDs is NP-complete. Therefore, based on the proposed rough set, we introduce a new method that can compute the approximate error measure g 3 ˜ of g3 in polynomial time. This study demonstrates that our approach can provide a substantially better approximation, that is, an approximation closer to the optimal solution g3, compared to the existing greedy method. We also introduce the differential-relation-based rough set from the perspective of information systems and make a connection to the rough sets induced by non-equivalence relations. The two views of the differential-relation-based rough sets form an essential bridge between the DDs in databases and attribute dependencies in differential decision systems (DDSs) that allows sharing measures for approximating the dependencies. These results are meaningful for approximate computations, the development of algorithms for attribute reduction in decision systems and the discovery of approximate differential dependencies (ADDs) in databases.

[1]  Lei Chen,et al.  Differential dependencies: Reasoning and discovery , 2011, TODS.

[2]  Lotfi A. Zadeh,et al.  Toward a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic , 1997, Fuzzy Sets Syst..

[3]  Pietro Sala,et al.  Mining approximate temporal functional dependencies with pure temporal grouping in clinical databases , 2015, Comput. Biol. Medicine.

[4]  Wenfei Fan,et al.  Conditional Functional Dependencies for Data Cleaning , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[5]  Jiuyong Li,et al.  Efficient Discovery of Differential Dependencies Through Association Rules Mining , 2015, ADC.

[6]  Wenfei Fan,et al.  Dependencies revisited for improving data quality , 2008, PODS.

[7]  Yiyu Yao,et al.  Constructive and Algebraic Methods of the Theory of Rough Sets , 1998, Inf. Sci..

[8]  Guoyin Wang,et al.  An incremental approach for attribute reduction based on knowledge granularity , 2016, Knowl. Based Syst..

[9]  Wynne Hsu,et al.  Temporal and Spatio-temporal Data Mining , 2007 .

[10]  Z. Pawlak Rough Sets: Theoretical Aspects of Reasoning about Data , 1991 .

[11]  Pietro Sala,et al.  Mining approximate interval-based temporal dependencies , 2015, Acta Informatica.

[12]  Guoyin Wang,et al.  Rough set extensions in incomplete information systems , 2008 .

[13]  Toon Calders,et al.  Searching for dependencies at multiple abstraction levels , 2002, TODS.

[14]  Guoyin Wang,et al.  Generalized approximations defined by non-equivalence relations , 2012, Inf. Sci..

[15]  Yiyu Yao,et al.  Relational Interpretations of Neigborhood Operators and Rough Set Approximation Operators , 1998, Inf. Sci..

[16]  Jerzy W. Grzymala-Busse,et al.  Rough Sets , 1995, Commun. ACM.

[17]  Alexis Tsoukiàs,et al.  On the Extension of Rough Sets under Incomplete Information , 1999, RSFDGrC.

[18]  Hannu Toivonen,et al.  TANE: An Efficient Algorithm for Discovering Functional and Approximate Dependencies , 1999, Comput. J..

[19]  Wojciech Ziarko,et al.  The Discovery, Analysis, and Representation of Data Dependencies in Databases , 1991, Knowledge Discovery in Databases.

[20]  Heikki Mannila,et al.  Approximate Inference of Functional Dependencies from Relations , 1995, Theor. Comput. Sci..

[21]  Geert Wets,et al.  A rough sets based characteristic relation approach for dynamic attribute generalization in data mining , 2007, Knowl. Based Syst..

[22]  Yanyong Guan,et al.  Set-valued information systems , 2006, Inf. Sci..

[23]  Yiyu Yao,et al.  On Generalizing Rough Set Theory , 2003, RSFDGrC.

[24]  Yiyu Yao,et al.  Mining High Order Decision Rules , 2003 .

[25]  Andreas Thor,et al.  Evaluation of entity resolution approaches on real-world match problems , 2010, Proc. VLDB Endow..

[26]  Lei Chen,et al.  Efficient discovery of similarity constraints for matching dependencies , 2013, Data Knowl. Eng..

[27]  Renée J. Miller,et al.  Discovering data quality rules , 2008, Proc. VLDB Endow..

[28]  Jun Zhang,et al.  Efficient attribute reduction from the viewpoint of discernibility , 2016, Inf. Sci..

[29]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[30]  Jiye Liang,et al.  A new measure of uncertainty based on knowledge granulation for rough sets , 2009, Inf. Sci..

[31]  Hong Cheng,et al.  Discovering Conditional Matching Rules , 2017, ACM Trans. Knowl. Discov. Data.

[32]  Philip S. Yu,et al.  Comparable dependencies over heterogeneous data , 2012, The VLDB Journal.

[33]  Hong Cheng,et al.  Efficient Determination of Distance Thresholds for Differential Dependencies , 2014, IEEE Transactions on Knowledge and Data Engineering.

[34]  Jerzy W. Grzymala-Busse,et al.  Characteristic Relations for Incomplete Data: A Generalization of the Indiscernibility Relation , 2005, Trans. Rough Sets.

[35]  I-Cheng Yeh,et al.  Knowledge discovery on RFM model using Bernoulli sequence , 2009, Expert Syst. Appl..

[36]  Wenfei Fan,et al.  Conditional functional dependencies for capturing data inconsistencies , 2008, TODS.

[37]  Shuai Ma,et al.  Interaction between Record Matching and Data Repairing , 2014, JDIQ.

[38]  Howard J. Hamilton,et al.  Mining functional dependencies from data , 2007, Data Mining and Knowledge Discovery.

[39]  Stefan Kramer,et al.  Compression-Based Evaluation of Partial Determinations , 1995, KDD.

[40]  Theophano Mitsa,et al.  Temporal Data Mining , 2010 .

[41]  Guoyin Wang,et al.  Extension of rough set under incomplete information systems , 2002, 2002 IEEE World Congress on Computational Intelligence. 2002 IEEE International Conference on Fuzzy Systems. FUZZ-IEEE'02. Proceedings (Cat. No.02CH37291).

[42]  Jerzy W. Grzymala-Busse,et al.  A Rough Set Approach to Data with Missing Attribute Values , 2006, RSKT.

[43]  Mark P. J. van der Loo,et al.  The stringdist Package for Approximate String Matching , 2014, R J..

[44]  Subbarao Kambhampati,et al.  Mining approximate functional dependencies and concept similarities to answer imprecise queries , 2004, WebDB '04.

[45]  Jiye Liang,et al.  Distance: A more comprehensible perspective for measures in rough set theory , 2012, Knowl. Based Syst..

[46]  Gonzalo Navarro,et al.  A guided tour to approximate string matching , 2001, CSUR.

[47]  Shuai Ma,et al.  Improving Data Quality: Consistency and Accuracy , 2007, VLDB.

[48]  Yiyu Yao,et al.  A measurement theory view on the granularity of partitions , 2012, Inf. Sci..

[49]  Yiyu Yao,et al.  Two views of the theory of rough sets in finite universes , 1996, Int. J. Approx. Reason..

[50]  Edward L. Robertson,et al.  On approximation measures for functional dependencies , 2004, Inf. Syst..

[51]  Qinghua Hu,et al.  Mixed feature selection based on granulation and approximation , 2008, Knowl. Based Syst..

[52]  Jerzy W. Grzymala-Busse,et al.  Rough Set Strategies to Data with Missing Attribute Values , 2006, Foundations and Novel Approaches in Data Mining.

[53]  Bei Yu,et al.  On generating near-optimal tableaux for conditional functional dependencies , 2008, Proc. VLDB Endow..

[54]  Yiyu Yao,et al.  Generalization of Rough Sets using Modal Logics , 1996, Intell. Autom. Soft Comput..

[55]  Tianrui Li,et al.  Composite rough sets for dynamic data mining , 2014, Inf. Sci..

[56]  Avishek Saha,et al.  Metric Functional Dependencies , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[57]  Sebastian Link,et al.  Technical Correspondence: “Differential Dependencies: Reasoning and Discovery” Revisited , 2015, TODS.

[58]  Stefan Kramer,et al.  Efficient Search for Strong Partial Determinations , 1996, KDD.

[59]  Xi Zhang,et al.  Estimating the confidence of conditional functional dependencies , 2009, SIGMOD Conference.

[60]  Yiyu Yao,et al.  Interpreting Low and High Order Rules: A Granular Computing Approach , 2007, RSEISP.

[61]  Amedeo Napoli,et al.  Characterization of Database Dependencies with FCA and Pattern Structures , 2014, AIST.

[62]  Rosine Cicchetti,et al.  Functional and embedded dependency inference: a data mining point of view , 2001, Inf. Syst..

[63]  Ronald S. King,et al.  Discovery of functional and approximate functional dependencies in relational databases , 2003, Adv. Decis. Sci..

[64]  Andrzej Skowron,et al.  Rough sets: Some extensions , 2007, Inf. Sci..

[65]  Qinghua Hu,et al.  Neighborhood classifiers , 2008, Expert Syst. Appl..

[66]  Qinghua Hu,et al.  Neighborhood rough set based heterogeneous feature subset selection , 2008, Inf. Sci..

[67]  Edward L. Robertson,et al.  FastFDs: A Heuristic-Driven, Depth-First Algorithm for Mining Functional Dependencies from Relation Instances - Extended Abstract , 2001, DaWaK.

[68]  Guangsheng Zhang,et al.  The Incremental Knowledge Acquisition Based on Hash Algorithm , 2016, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[69]  János Demetrovics,et al.  Functional Dependencies in Relational Databases: A Lattice Point of View , 1992, Discret. Appl. Math..

[70]  E. F. Codd,et al.  Recent Investigations in Relational Data Base Systems , 1974, ACM Pacific.

[71]  Jane Grimson,et al.  Database sampling with functional dependencies , 2001, Inf. Softw. Technol..

[72]  Jiye Liang,et al.  The Information Entropy, Rough Entropy And Knowledge Granulation In Rough Set Theory , 2004, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[73]  Liang Liu,et al.  Attribute selection based on a new conditional entropy for incomplete decision systems , 2013, Knowl. Based Syst..

[74]  Subbarao Kambhampati,et al.  SMARTINT: using mined attribute dependencies to integrate fragmented web databases , 2011, Journal of Intelligent Information Systems.

[75]  Pınar Tüfekci,et al.  Prediction of full load electrical power output of a base load operated combined cycle power plant using machine learning methods , 2014 .

[76]  Esko Ukkonen,et al.  Approximate String Matching with q-grams and Maximal Matches , 1992, Theor. Comput. Sci..

[77]  Marzena Kryszkiewicz,et al.  Rough Set Approach to Incomplete Information Systems , 1998, Inf. Sci..