Uncertain training data set conceptual reduction: A machine learning perspective

Knowledge discovery from data is a challenging problem that has significant importance in many different fields such as biology, economics and social sciences. Real-world data is incomplete and ambiguous; moreover, its rapid increase in size complicates the analysis process. Therefore, data reduction techniques that consider data uncertainty are highly required. In this paper, our objective is to conceptually reduce uncertain data without losing information. Two reduction methods are proposed that are mainly rooted in formal concept analysis theory. The first method is targeting approximate data reduction; it uses the result of Baixeries et al. for detecting functional dependencies by transforming an instance of a database into an approximate formal context. The second method is based on fuzzy data reduction that employs the algorithm of Elloumi et al. in fuzzy data reduction using Lukasiewicz logic. These reduction methods have been compared to three other machine learning based reduction algorithms through a classification case study of breast cancer data. Classification accuracy, root mean square error and reduced data size have been reported to show that reduced training sets using our methods result in very accurate classifiers with minimal data size. Moreover, the reduced data has the advantage of decreasing communication time and memory space.

[1]  Sergei O. Kuznetsov,et al.  Learning of Simple Conceptual Graphs from Positive and Negative Examples , 1999, PKDD.

[2]  Cory J. Butz,et al.  FD/spl I.bar/Mine: discovering functional dependencies in a database using equivalences , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[3]  Wang Li,et al.  Data Dimension Reduction Based on Concept Lattices in Image Mining , 2009, 2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery.

[4]  Ron Kohavi,et al.  Feature Subset Selection Using the Wrapper Method: Overfitting and Dynamic Search Space Topology , 1995, KDD.

[5]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[6]  Pooja Arora A Comparative Study of Instance Reduction Techniques , 2013 .

[7]  Jean-Marc Petit,et al.  Efficient Discovery of Functional Dependencies and Armstrong Relations , 2000, EDBT.

[8]  Edward L. Robertson,et al.  FastFDs: A Heuristic-Driven, Depth-First Algorithm for Mining Functional Dependencies from Relation Instances - Extended Abstract , 2001, DaWaK.

[9]  Vincent Duquenne,et al.  Attribute-incremental construction of the canonical implication basis , 2007, Annals of Mathematics and Artificial Intelligence.

[10]  Jinhai Li,et al.  Incomplete decision contexts: Approximate concept construction, rule acquisition and knowledge reduction , 2013, Int. J. Approx. Reason..

[11]  Ying Bai,et al.  Fundamentals of Fuzzy Logic Control — Fuzzy Sets, Fuzzy Rules and Defuzzifications , 2006 .

[12]  C. A. Murthy,et al.  Unsupervised Feature Selection Using Feature Similarity , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Angel Mora,et al.  Computing Left-Minimal Direct Basis of implications , 2013, CLA.

[14]  Marie-Jeanne Lesot,et al.  Similarity measures for binary and numerical data: a survey , 2008, Int. J. Knowl. Eng. Soft Data Paradigms.

[15]  魏玲,et al.  Approximate concepts acquisition based on formal contexts , 2015 .

[16]  A. Jaoua,et al.  Discovering knowledge from fuzzy concept lattice , 2001 .

[17]  Yulian Zhu,et al.  Subpattern-based principle component analysis , 2004, Pattern Recognit..

[18]  Samir Elloumi,et al.  Galois Connection in Fuzzy Binary Relations, Applications for Discovering Association Rules and Decision Making , 2000, RelMiCS.

[19]  Qiang Shen,et al.  Centre for Intelligent Systems and Their Applications Fuzzy Rough Attribute Reduction with Application to Web Categorization Fuzzy Rough Attribute Reduction with Application to Web Categorization Fuzzy Sets and Systems ( ) – Fuzzy–rough Attribute Reduction with Application to Web Categorization , 2022 .

[20]  Hannu Toivonen,et al.  TANE: An Efficient Algorithm for Discovering Functional and Approximate Dependencies , 1999, Comput. J..

[21]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[22]  Yan-Hui Zhai,et al.  Generating complete set of implications for formal contexts , 2008, Knowl. Based Syst..

[23]  Ali Jaoua,et al.  Using Formal Concept Analysis for Heterogeneous Information Retrieval , 2005, CLA.

[24]  Vilém Vychodil,et al.  Towards Armstrong-Style Inference System for Attribute Implications with Temporal Semantics , 2014, MDAI.

[25]  Wei-Zhi Wu,et al.  Approaches to knowledge reduction based on variable precision rough set model , 2004, Inf. Sci..

[26]  Tao Wu,et al.  Image mining for robot vision based on concept analysis , 2007, 2007 IEEE International Conference on Robotics and Biomimetics (ROBIO).

[27]  Felix Naumann,et al.  DFD: Efficient Functional Dependency Discovery , 2014, CIKM.

[28]  Felix Naumann,et al.  Functional Dependency Discovery: An Experimental Evaluation of Seven Algorithms , 2015, Proc. VLDB Endow..

[29]  Malcolm J. Beynon,et al.  Reducts within the variable precision rough sets model: A further investigation , 2001, Eur. J. Oper. Res..

[30]  Samir Elloumi,et al.  A multi-level conceptual data reduction approach based on the Lukasiewicz implication , 2004, Inf. Sci..

[31]  Yang Xu,et al.  Decision Making with Uncertainty Information Based on Lattice-Valued Fuzzy Concept Lattice , 2010, J. Univers. Comput. Sci..

[32]  J. Deogun,et al.  Concept approximations based on rough sets and similarity measures , 2001 .

[33]  Saroj Ratnoo A Comparative Study of Instance Reduction Techniques , 2013 .

[34]  Abraham Kandel,et al.  Fuzzification and reduction of information-theoretic rule sets , 2001 .

[35]  Qinghua Hu,et al.  Information-preserving hybrid data reduction based on fuzzy-rough techniques , 2006, Pattern Recognit. Lett..

[36]  Amedeo Napoli,et al.  Characterizing functional dependencies in formal concept analysis with pattern structures , 2014, Annals of Mathematics and Artificial Intelligence.

[37]  Samir Elloumi,et al.  Galois connection, formal concepts and Galois lattice in real relations: application in a real classifier , 2002, J. Syst. Softw..

[38]  J.M. Rodriguez-Jimenez,et al.  Negative Attributes and Implications in Formal Concept Analysis , 2014, ITQM.

[39]  Rosine Cicchetti,et al.  FUN: An Efficient Algorithm for Mining Functional and Embedded Dependencies , 2001, ICDT.

[40]  Peter A. Flach,et al.  Database Dependency Discovery: A Machine Learning Approach , 1999, AI Commun..

[41]  Duoqian Miao,et al.  Analysis on attribute reduction strategies of rough set , 1998, Journal of Computer Science and Technology.