Triadic Formal Concept Analysis and triclustering: searching for optimal patterns

This paper presents several definitions of “optimal patterns” in triadic data and results of experimental comparison of five triclustering algorithms on real-world and synthetic datasets. The evaluation is carried over such criteria as resource efficiency, noise tolerance and quality scores involving cardinality, density, coverage, and diversity of the patterns. An ideal triadic pattern is a totally dense maximal cuboid (formal triconcept). Relaxations of this notion under consideration are: OAC-triclusters; triclusters optimal with respect to the least-square criterion; and graph partitions obtained by using spectral clustering. We show that searching for an optimal tricluster cover is an NP-complete problem, whereas determining the number of such covers is #P-complete. Our extensive computational experiments lead us to a clear strategy for choosing a solution at a given dataset guided by the principle of Pareto-optimality according to the proposed criteria.

[1]  Engelbert Mephu Nguifo,et al.  CLANN: Concept Lattice-based Artificial Neural Network for Supervised Classification , 2007, CLA.

[2]  Panagiotis Symeonidis,et al.  MusicBox: Personalized Music Recommendation Based on Cubic Analysis of Social Tags , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Sergei O. Kuznetsov,et al.  Toxicology Analysis by Means of the JSM-method , 2003, Bioinform..

[4]  Amedeo Napoli,et al.  Biclustering meets triadic concept analysis , 2013, Annals of Mathematics and Artificial Intelligence.

[5]  Sergei O. Kuznetsov,et al.  From Triadic FCA to Triclustering: Experimental Comparison of Some Triclustering Algorithms , 2013, CLA.

[6]  Guoyin Wang,et al.  Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing , 2013, Lecture Notes in Computer Science.

[7]  Boris G. Mirkin,et al.  Approximate Bicluster and Tricluster Boxes in the Analysis of Binary Data , 2011, RSFDGrC.

[8]  Sergei O. Kuznetsov,et al.  Frequent Itemset Mining for Clustering Near Duplicate Web Documents , 2009, ICCS.

[9]  Rudolf Wille,et al.  A Triadic Approach to Formal Concept Analysis , 1995, ICCS.

[10]  Jean-François Boulicaut,et al.  Closed and noise-tolerant patterns in n-ary relations , 2012, Data Mining and Knowledge Discovery.

[11]  Claudio Carpineto,et al.  A lattice conceptual clustering system and its application to browsing retrieval , 2004, Machine Learning.

[12]  Jonas Poelmans,et al.  Formal Concept Analysis in knowledge processing: A survey on models and techniques , 2013, Expert Syst. Appl..

[13]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[14]  Leonid Zhukov,et al.  From Triconcepts to Triclusters , 2011, RSFDGrC.

[15]  Tijl De Bie,et al.  Interesting pattern mining in multi-relational data , 2013, Data Mining and Knowledge Discovery.

[16]  Blaise Hanczar,et al.  Bagging for Biclustering: Application to Microarray Data , 2010, ECML/PKDD.

[17]  Claudio Carpineto,et al.  Concept data analysis - theory and applications , 2004 .

[18]  Dominik Benz,et al.  The social bookmark and publication management system bibsonomy , 2010, The VLDB Journal.

[19]  Radim Belohlávek,et al.  Impact of Boolean factorization as preprocessing methods for classification of Boolean data , 2014, Annals of Mathematics and Artificial Intelligence.

[20]  Panagiotis Symeonidis,et al.  Nearest-biclusters collaborative filtering based on constant and coherent values , 2008, Information Retrieval.

[21]  Peter W. Eklund,et al.  Concept similarity and related categories in information retrieval using formal concept analysis , 2012, Int. J. Gen. Syst..

[22]  Iven Van Mechelen,et al.  Constrained Latent Class Analysis of Three-Way Three-Mode Data , 2002, J. Classif..

[23]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[24]  M. Fiedler Algebraic connectivity of graphs , 1973 .

[25]  Sergei O. Kuznetsov,et al.  Learning Closed Sets of Labeled Graphs for Chemical Applications , 2005, ILP.

[26]  Mohammed J. Zaki,et al.  TRICLUSTER: an effective algorithm for mining coherent clusters in 3D microarray data , 2005, SIGMOD '05.

[27]  Karell Bertet,et al.  Navigala: an Original Symbol Classifier Based on Navigation through a Galois Lattice , 2011, Int. J. Pattern Recognit. Artif. Intell..

[28]  Bernhard Schölkopf,et al.  Multi-way set enumeration in weight tensors , 2011, Machine Learning.

[29]  Aleksey Buzmakov,et al.  A Hybrid Classification Approach based on FCA and Emerging Patterns - An application for the classification of biological inhibitors , 2012, CLA.

[30]  Amedeo Napoli,et al.  Mining gene expression data with pattern structures in formal concept analysis , 2011, Inf. Sci..

[31]  Klaudia Frankfurter Computers And Intractability A Guide To The Theory Of Np Completeness , 2016 .

[32]  Sorin Draghici,et al.  Machine Learning and Its Applications to Biology , 2007, PLoS Comput. Biol..

[33]  Steffen Staab,et al.  Learning Concept Hierarchies from Text Corpora using Formal Concept Analysis , 2005, J. Artif. Intell. Res..

[34]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[35]  Sergei O. Kuznetsov,et al.  Concept-based Recommendations for Internet Advertisement , 2009, ArXiv.

[36]  Anthony K. H. Tung,et al.  Mining frequent closed cubes in 3D datasets , 2006, VLDB.

[37]  Leslie G. Valiant,et al.  The Complexity of Enumeration and Reliability Problems , 1979, SIAM J. Comput..

[38]  Dmitry I. Ignatov,et al.  Boolean Matrix Factorisation for Collaborative Filtering: An FCA-Based Approach , 2014, AIMSA.

[39]  Eckart Zitzler,et al.  BicAT: a biclustering analysis toolbox , 2006, Bioinform..

[40]  Jean-François Boulicaut,et al.  Constraint-based concept mining and its application to microarray data analysis , 2005, Intell. Data Anal..

[41]  Jonas Poelmans,et al.  Can triconcepts become triclusters? , 2013, Int. J. Gen. Syst..

[42]  Charu C. Aggarwal,et al.  XRules: An effective algorithm for structural classification of XML data , 2006, Machine Learning.

[43]  Chedy Raïssi,et al.  On Projections of Sequential Pattern Structures (with an Application on Care Trajectories) , 2013, CLA.

[44]  Tie-Yan Liu,et al.  Consistent bipartite graph co-partitioning for star-structured high-order heterogeneous data co-clustering , 2005, KDD '05.

[45]  Claudio Carpineto,et al.  A Concept Lattice-Based Kernel for SVM Text Classification , 2009, ICFCA.

[46]  Claudio Carpineto,et al.  GALOIS: An Order-Theoretic Approach to Conceptual Clustering , 1993, ICML.

[47]  Mohammed J. Zaki,et al.  SPADE: An Efficient Algorithm for Mining Frequent Sequences , 2004, Machine Learning.

[48]  Camille Roth,et al.  On Succinct Representation of Knowledge Community Taxonomies with Formal Concept Analysis , 2008, Int. J. Found. Comput. Sci..

[49]  Nicolas Pasquier,et al.  Efficient Mining of Association Rules Using Closed Itemset Lattices , 1999, Inf. Syst..

[50]  Sebastian Rudolph,et al.  Using FCA for Encoding Closure Operators into Neural Networks , 2007, ICCS.

[51]  Vincent Duquenne Lattice analysis and the representation of handicap associations , 1996 .

[52]  Klaus Biedermann Powerset Trilattices , 1998, ICCS.

[53]  Cynthia Vera Glodeanu,et al.  Optimal Factorization of Three-Way Binary Data Using Triadic Concepts , 2013, Order.

[54]  Mykola Pechenizkiy,et al.  Diversity in search strategies for ensemble feature selection , 2005, Inf. Fusion.

[55]  Christodoulos A Floudas,et al.  A novel framework for predicting in vivo toxicities from in vitro data using optimal methods for dense and sparse matrix reordering and logistic regression. , 2010, Toxicological sciences : an official journal of the Society of Toxicology.

[56]  Myra Spiliopoulou,et al.  Spectral Clustering in Social-Tagging Systems , 2009, WISE.

[57]  Pauli Miettinen,et al.  Boolean Tensor Factorizations , 2011, 2011 IEEE 11th International Conference on Data Mining.

[58]  Inderjit S. Dhillon,et al.  A generalized maximum entropy approach to bregman co-clustering and matrix approximation , 2004, J. Mach. Learn. Res..

[59]  Jean-François Boulicaut,et al.  Closed patterns meet n-ary relations , 2009, TKDD.

[60]  Mirkin Boris Grigorievich Approximate bicluster and tricluster boxes in the analysis of binary data , 2011 .

[61]  Bjoern Koester,et al.  Conceptual Knowledge Retrieval with FooCA: Improving Web Search Engine Results with Contexts and Concept Hierarchies , 2006, ICDM.

[62]  Vilém Vychodil,et al.  Discovery of optimal factors in binary data via a novel method of matrix decomposition , 2010, J. Comput. Syst. Sci..

[63]  Bernhard Ganter,et al.  Hypotheses and Version Spaces , 2003, ICCS.

[64]  Rudolf Wille,et al.  The Basic Theorem of triadic concept analysis , 1995 .

[65]  Jonas Poelmans,et al.  Gaining Insight in Social Networks with Biclustering and Triclustering , 2012, BIR.

[66]  Bernard De Baets,et al.  Inducing decision trees via concept lattices , 2009, CLA.

[67]  Mohammed J. Zaki,et al.  Efficient algorithms for mining closed itemsets and their lattice structure , 2005, IEEE Transactions on Knowledge and Data Engineering.

[68]  Bernhard Ganter,et al.  TRIPAT: a Model for Analyzing Three-Mode Binary Data , 1994 .

[69]  Jan Outrata,et al.  Boolean Factor Analysis for Data Preprocessing in Machine Learning , 2010, 2010 Ninth International Conference on Machine Learning and Applications.

[70]  Jonas Poelmans,et al.  Concept-Based Biclustering for Internet Advertisement , 2012, 2012 IEEE 12th International Conference on Data Mining Workshops.

[71]  David Tuck,et al.  An Effective Tri-Clustering Algorithm Combining Expression Data with Gene Regulation Information , 2009, Gene regulation and systems biology.

[72]  Derek G. Bridge,et al.  Collaborative Recommending using Formal Concept Analysis , 2006, Knowl. Based Syst..

[73]  Engelbert Mephu Nguifo,et al.  A Comparative Study of FCA-Based Supervised Classification Algorithms , 2004, ICFCA.

[74]  Jonas Poelmans,et al.  Formal concept analysis in knowledge processing: A survey on applications , 2013, Expert Syst. Appl..

[75]  Linton C. Freeman,et al.  Cliques, Galois lattices, and the structure of human social groups☆ , 1996 .

[76]  George Voutsadakis,et al.  Polyadic Concept Analysis , 2002, Order.

[77]  Inderjit S. Dhillon,et al.  Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[78]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[79]  Sergei O. Kuznetsov,et al.  Machine Learning and Formal Concept Analysis , 2004, ICFCA.

[80]  Boris Mirkin,et al.  Mathematical Classification and Clustering , 1996 .

[81]  Sergei O. Kuznetsov,et al.  Comparing performance of algorithms for generating concept lattices , 2002, J. Exp. Theor. Artif. Intell..

[82]  Mehmet Deveci,et al.  A comparative analysis of biclustering algorithms for gene expression data , 2013, Briefings Bioinform..

[83]  Bernhard Ganter,et al.  Formal Concept Analysis: Mathematical Foundations , 1998 .

[84]  Matthieu Latapy,et al.  Basic notions for the analysis of large two-mode networks , 2008, Soc. Networks.

[85]  Weizhe Zhang,et al.  Unsupervised Tag Sense Disambiguation in Folksonomies , 2010, J. Comput..

[86]  Andreas Hotho,et al.  TRIAS--An Algorithm for Mining Iceberg Tri-Lattices , 2006, Sixth International Conference on Data Mining (ICDM'06).

[87]  Jonas Poelmans,et al.  Text Mining Scientific Papers: A Survey on FCA-Based Information Retrieval Research , 2012, ICDM.