Discovery of the D-basis in binary tables based on hypergraph dualization

Discovery of (strong) association rules, or implications, is an important task in data management, and it finds application in artificial intelligence, data mining and the semantic web. We introduce a novel approach for the discovery of a specific set of implications, called the D-basis, that provides a representation for a reduced binary table, based on the structure of its Galois lattice. At the core of the method are the D-relation defined in a lattice theoretic framework and the hypergraph dualization algorithm that allows us to effectively produce the set of transversals for a given Sperner hypergraph. The latter algorithm, first developed by specialists from Rutgers Center for Operations Research, has already found numerous applications in solving optimization problems in data base theory, artificial intelligence and game theory. One application of the method is for analysis of gene expression data related to a particular phenotypic variable, and some initial testing is done for data provided by the University of Hawaii Cancer Center.

[1]  P. Hammer,et al.  Dual subimplicants of positive Boolean functions , 1998 .

[2]  Vladimir Gurvich,et al.  On the Complexity of Generating Maximal Frequent and Minimal Infrequent Sets , 2002, STACS.

[3]  Heikki Mannila,et al.  Verkamo: Fast Discovery of Association Rules , 1996, KDD 1996.

[4]  Uwe Ryssel,et al.  Fast algorithms for implication bases and attribute exploration using proper premises , 2014, Annals of Mathematics and Artificial Intelligence.

[5]  Bernhard Ganter,et al.  Constructing a Knowledge Base for Gene Regulatory Dynamics by Formal Concept Analysis Methods , 2008, AB.

[6]  Claudio Carpineto,et al.  Concept data analysis - theory and applications , 2004 .

[7]  Rokia Missaoui,et al.  A framework for incremental generation of closed itemsets , 2008, Discret. Appl. Math..

[8]  Baris Sertkaya,et al.  Formal concept analysis methods for description logics , 2007 .

[9]  Uwe Ryssel,et al.  Fast Computation of Proper Premises , 2011, CLA.

[10]  Reinhard Guthke,et al.  Adapted Boolean network models for extracellular matrix formation , 2009, BMC Systems Biology.

[11]  Vilém Vychodil,et al.  Computing Formal Concepts by Attribute Sorting , 2012, Fundam. Informaticae.

[12]  Georg Gottlob,et al.  Identifying the Minimal Transversals of a Hypergraph and Related Problems , 1995, SIAM J. Comput..

[13]  Vladimir Gurvich,et al.  Generating dual-bounded hypergraphs , 2002, Optim. Methods Softw..

[14]  Paulo Cortez,et al.  Data Mining with , 2005 .

[15]  Leonid Khachiyan,et al.  On the Complexity of Dualization of Monotone Disjunctive Normal Forms , 1996, J. Algorithms.

[16]  Amedeo Napoli,et al.  Mining gene expression data with pattern structures in formal concept analysis , 2011, Inf. Sci..

[17]  C Chabert,et al.  Assessing Implications Between Genotypic and Phenotypic Variables Through Lattice Analysis , 2001, Behavior genetics.

[18]  Sergei O. Kuznetsov,et al.  Computing premises of a minimal cover of functional dependencies is intractable , 2013, Discret. Appl. Math..

[19]  Robert Rand,et al.  Ordered direct implicational basis of a finite closure system , 2011, ISAIM.

[20]  James B. Nation,et al.  An approach to lattice varieties of finite height , 1990 .

[21]  Bernhard Ganter,et al.  Attribute Exploration with Background Knowledge , 1999, Theor. Comput. Sci..

[22]  Bernhard Ganter,et al.  Formal Concept Analysis: Mathematical Foundations , 1998 .

[23]  Karell Bertet,et al.  The multiple facets of the canonical direct unit implicational basis , 2010, Theor. Comput. Sci..

[24]  Vincent Duquenne,et al.  Familles minimales d'implications informatives résultant d'un tableau de données binaires , 1986 .

[25]  Bernhard Ganter,et al.  Pattern Structures and Their Projections , 2001, ICCS.

[26]  Bernard Monjardet,et al.  The Lattices of Closure Systems, Closure Operators, and Implicational Systems on a Finite Set: A Survey , 2003, Discret. Appl. Math..

[27]  Jean-Marc Petit,et al.  Defining , mining and reasoning on rules in tabular data , 2005 .

[28]  Bernhard Ganter,et al.  Formal Concept Analysis , 2013 .

[29]  Felix Distel,et al.  On the complexity of enumerating pseudo-intents , 2011, Discret. Appl. Math..

[30]  Jonas Poelmans,et al.  Formal Concept Analysis in Knowledge Discovery: A Survey , 2010, ICCS.

[31]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[32]  Jingyu Sun,et al.  A Topic-Specific Web Crawler with Concept Similarity Context Graph Based on FCA , 2008, ICIC.

[33]  Takeaki Uno,et al.  Efficient algorithms for dualizing large-scale hypergraphs , 2011, Discret. Appl. Math..

[34]  L. Beran,et al.  [Formal concept analysis]. , 1996, Casopis lekaru ceskych.

[35]  Matthias Hagen,et al.  Algorithmic and Computational Complexity Issues of MONET , 2008 .

[36]  Zhaohui Tang,et al.  Data Mining with SQL Server 2005 , 2005 .

[37]  Xia Wang,et al.  Ontology Mapping based on Rough Formal Concept Analysis , 2006, Advanced Int'l Conference on Telecommunications and Int'l Conference on Internet and Web Applications and Services (AICT-ICIW'06).

[38]  Gordon Okimoto,et al.  Measuring the Implications of the D-Basis in Analysis of Data in Biomedical Studies , 2015, ICFCA.

[39]  L. Santocanale,et al.  Free μ-lattices , 2000 .

[40]  Vladimir Gurvich,et al.  An efficient implementation of a quasi-polynomial algorithm for generating hypergraph transversals and its application in joint generation , 2006, Discret. Appl. Math..