Biclustering meets triadic concept analysis

Biclustering numerical data became a popular data-mining task at the beginning of 2000’s, especially for gene expression data analysis and recommender systems. A bicluster reflects a strong association between a subset of objects and a subset of attributes in a numerical object/attribute data-table. So-called biclusters of similar values can be thought as maximal sub-tables with close values. Only few methods address a complete, correct and non-redundant enumeration of such patterns, a well-known intractable problem, while no formal framework exists. We introduce important links between biclustering and Formal Concept Analysis (FCA). Indeed, FCA is known to be, among others, a methodology for biclustering binary data. Handling numerical data is not direct, and we argue that Triadic Concept Analysis (TCA), the extension of FCA to ternary relations, provides a powerful mathematical and algorithmic framework for biclustering numerical data. We discuss hence both theoretical and computational aspects on biclustering numerical data with triadic concept analysis. These results also scale to n-dimensional numerical datasets.

[1]  Jonas Poelmans,et al.  Concept-Based Biclustering for Internet Advertisement , 2012, 2012 IEEE 12th International Conference on Data Mining Workshops.

[2]  Chedy Raïssi,et al.  Mining Dominant Patterns in the Sky , 2011, 2011 IEEE 11th International Conference on Data Mining.

[3]  Raoul Medina,et al.  Formal Concept Analysis, 6th International Conference, ICFCA 2008, Montreal, Canada, February 25-28, 2008, Proceedings , 2008, ICFCA.

[4]  Hans-Peter Kriegel,et al.  Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering , 2009, TKDD.

[5]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[6]  Raj Bhatnagar,et al.  Similarity measures in formal concept analysis , 2011, Annals of Mathematics and Artificial Intelligence.

[7]  Boris G. Mirkin,et al.  Approximate Bicluster and Tricluster Boxes in the Analysis of Binary Data , 2011, RSFDGrC.

[8]  Susanne Motameny,et al.  Formal Concept Analysis for the Identification of Combinatorial Biomarkers in Breast Cancer , 2008, ICFCA.

[9]  Gediminas Adomavicius,et al.  Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions , 2005, IEEE Transactions on Knowledge and Data Engineering.

[10]  Andrea Califano,et al.  Analysis of Gene Expression Microarrays for Phenotype Classification , 2000, ISMB.

[11]  George Voutsadakis,et al.  Polyadic Concept Analysis , 2002, Order.

[12]  Rudolf Wille,et al.  Restructuring Lattice Theory: An Approach Based on Hierarchies of Concepts , 2009, ICFCA.

[13]  Mohammed J. Zaki,et al.  TRICLUSTER: an effective algorithm for mining coherent clusters in 3D microarray data , 2005, SIGMOD '05.

[14]  Chun Zhang,et al.  Storing and querying ordered XML using a relational database system , 2002, SIGMOD '02.

[15]  Andreas Hotho,et al.  TRIAS--An Algorithm for Mining Iceberg Tri-Lattices , 2006, Sixth International Conference on Data Mining (ICDM'06).

[16]  Amedeo Napoli,et al.  Mining gene expression data with pattern structures in formal concept analysis , 2011, Inf. Sci..

[17]  Vilém Vychodil,et al.  Distributed Algorithm for Computing Formal Concepts Using Map-Reduce Framework , 2009, IDA.

[18]  Rokia Missaoui,et al.  Formal Concept Analysis for Knowledge Discovery and Data Mining: The New Challenges , 2004, ICFCA.

[19]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[20]  Yi Huang,et al.  Mining biological information from 3D short time-series gene expression data: the OPTricluster algorithm , 2012, BMC Bioinformatics.

[21]  Amedeo Napoli,et al.  Biclustering Numerical Data in Formal Concept Analysis , 2011, ICFCA.

[22]  Pascal Hitzler,et al.  Proceedings of the 14th international conference on Conceptual Structures: inspiration and Application , 2006 .

[23]  Boris Mirkin,et al.  Mathematical Classification and Clustering , 1996 .

[24]  Zainab Assaghir,et al.  Embedding tolerance relations in formal concept analysis: an application in information fusion , 2010, CIKM '10.

[25]  Jean-François Boulicaut,et al.  Closed patterns meet n-ary relations , 2009, TKDD.

[26]  Boris Mirkin,et al.  Clustering For Data Mining: A Data Recovery Approach (Chapman & Hall/Crc Computer Science) , 2005 .

[27]  Ruggero G. Pensa,et al.  Clustering Formal Concepts to Discover Biologically Relevant Knowledge from Gene Expression Data , 2007, Silico Biol..

[28]  Rudolf Wille,et al.  Why can concept lattices support knowledge discovery in databases? , 2002, J. Exp. Theor. Artif. Intell..

[29]  Amedeo Napoli,et al.  Mining Biclusters of Similar Values with Triadic Concept Analysis , 2011, CLA.

[30]  Sebastian Rudolph,et al.  Proceedings of the 7th International Conference on Formal Concept Analysis , 2009 .

[31]  Anthony K. H. Tung,et al.  Mining frequent closed cubes in 3D datasets , 2006, VLDB.

[32]  J. Hartigan Direct Clustering of a Data Matrix , 1972 .

[33]  Das Amrita,et al.  Mining Association Rules between Sets of Items in Large Databases , 2013 .

[34]  Bernhard Ganter,et al.  Formal Concept Analysis , 2013 .

[35]  Mohammed J. Zaki,et al.  The ParTriCluster Algorithm for Gene Expression Analysis , 2007, International Journal of Parallel Programming.

[36]  Dimitrios Gunopulos,et al.  Automatic Subspace Clustering of High Dimensional Data , 2005, Data Mining and Knowledge Discovery.

[37]  Amedeo Napoli,et al.  Two FCA-Based Methods for Mining Gene Expression Data , 2009, ICFCA.

[38]  Lothar Thiele,et al.  A systematic comparison and evaluation of biclustering methods for gene expression data , 2006, Bioinform..

[39]  Jean-François Boulicaut,et al.  Mining a New Fault-Tolerant Pattern Type as an Alternative to Formal Concept Discovery , 2006, ICCS.

[40]  Luc De Raedt,et al.  Mining Bi-sets in Numerical Data , 2006, KDID.

[41]  Bernhard Ganter,et al.  Formal Concept Analysis, 6th International Conference, ICFCA 2008, Montreal, Canada, February 25-28, 2008, Proceedings , 2008, International Conference on Formal Concept Analysis.

[42]  Sergei O. Kuznetsov,et al.  Comparing performance of algorithms for generating concept lattices , 2002, J. Exp. Theor. Artif. Intell..

[43]  Rudolf Wille,et al.  A Triadic Approach to Formal Concept Analysis , 1995, ICCS.

[44]  Ruggero G. Pensa,et al.  Assessment of discretization techniques for relevant pattern discovery from gene expression data , 2004, BIOKDD.

[45]  Chedy Raïssi,et al.  Computing closed skycubes , 2010, Proc. VLDB Endow..