Triclustering Algorithms for Three-Dimensional Data Analysis

Three-dimensional data are increasingly prevalent across biomedical and social domains. Notable examples are gene-sample-time, individual-feature-time, or node-node-time data, generally referred to as observation-attribute-context data. The unsupervised analysis of three-dimensional data can be pursued to discover putative biological modules, disease progression profiles, and communities of individuals with coherent behavior, among other patterns of interest. It is thus key to enhance the understanding of complex biological, individual, and societal systems. In this context, although clustering can be applied to group observations, its relevance is limited since observations in three-dimensional data domains are typically only meaningfully correlated on subspaces of the overall space. Biclustering tackles this challenge but disregards the third dimension. In this scenario, triclustering—the discovery of coherent subspaces within three-dimensional data—has been largely researched to tackle these problems. Despite the diversity of contributions in this field, there still lacks a structured view on the major requirements of triclustering, desirable forms of homogeneity (including coherency, structure, quality, locality, and orthonormality criteria), and algorithmic approaches. This work formalizes the triclustering task and its scope, introduces a taxonomy to categorize the contributions in the field, provides a comprehensive comparison of state-of-the-art triclustering algorithms according to their behavior and output, and lists relevant real-world applications. Finally, it highlights challenges and opportunities to advance the field of triclustering and its applicability to complex three-dimensional data analysis.

[1]  Bernhard Ganter,et al.  TRIPAT: a Model for Analyzing Three-Mode Binary Data , 1994 .

[2]  Rui Henriques,et al.  BicPAM: Pattern-based biclustering for biomedical data analysis , 2014, Algorithms for Molecular Biology.

[3]  Jean-François Boulicaut,et al.  Closed patterns meet n-ary relations , 2009, TKDD.

[4]  Xie Yuan-dan,et al.  Survey on Image Segmentation , 2002 .

[5]  Ricardo J. G. B. Campello,et al.  A systematic comparative evaluation of biclustering techniques , 2017, BMC Bioinformatics.

[6]  Jun Wang,et al.  Discovering Multidimensional Motifs in Physiological Signals for Personalized Healthcare , 2016, IEEE Journal of Selected Topics in Signal Processing.

[7]  R. Rathipriya,et al.  Triclustering: An evolution of clustering , 2016, 2016 Online International Conference on Green Engineering and Technologies (IC-GET).

[8]  Cláudia Antunes,et al.  Generative modeling of repositories of health records for predictive tasks , 2014, Data Mining and Knowledge Discovery.

[9]  Shuigeng Zhou,et al.  gTRICLUSTER: A More General and Effective 3D Clustering Algorithm for Gene-Sample-Time Microarray Data , 2006, BioDM.

[10]  Richard Bonneau,et al.  Multi-species integrative biclustering , 2010, Genome Biology.

[11]  Zhen Hu,et al.  Discovery of Versatile Temporal Subspace Patterns in 3-D Datasets , 2011, 2011 IEEE 11th International Conference on Data Mining.

[12]  Anthony K. H. Tung,et al.  Mining frequent closed cubes in 3D datasets , 2006, VLDB.

[13]  Daniel F Hayes,et al.  OMICS-based personalized oncology: if it is worth doing, it is worth doing well! , 2013, BMC Medicine.

[14]  Hyejin Kang,et al.  TimesVector: a vectorized clustering approach to the analysis of time series transcriptome data from multiple phenotypes , 2017, Bioinform..

[15]  Francisco Martínez-Álvarez,et al.  A Novel Method for Seismogenic Zoning Based on Triclustering: Application to the Iberian Peninsula , 2015, Entropy.

[16]  Raj Bhatnagar,et al.  An effective algorithm for mining 3-clusters in vertically partitioned data , 2008, CIKM '08.

[17]  Jörg Sander,et al.  Finding non-redundant, statistically significant regions in high dimensional data: a novel approach to projected and subspace clustering , 2008, KDD.

[18]  Rudolf Wille,et al.  A Triadic Approach to Formal Concept Analysis , 1995, ICCS.

[19]  Hui Ding,et al.  Querying and mining of time series data: experimental comparison of representations and distance measures , 2008, Proc. VLDB Endow..

[20]  M. Ng,et al.  MultiFacTV: module detection from higher-order time series biological data , 2013, BMC Genomics.

[21]  Siamak Noorbaloochi,et al.  Multivariate time series analysis of neuroscience data: some challenges and opportunities , 2016, Current Opinion in Neurobiology.

[22]  Jean-François Boulicaut,et al.  Closed and noise-tolerant patterns in n-ary relations , 2012, Data Mining and Knowledge Discovery.

[23]  Philip S. Yu,et al.  Unsupervised learning on k-partite graphs , 2006, KDD '06.

[24]  Mohammed J. Zaki,et al.  TRICLUSTER: an effective algorithm for mining coherent clusters in 3D microarray data , 2005, SIGMOD '05.

[25]  David R. Booth,et al.  Identifying Key Regulatory Genes in the Whole Blood of Septic Patients to Monitor Underlying Immune Dysfunctions , 2013, Shock.

[26]  Vincent S. Tseng,et al.  A novel method for mining temporally dependent association rules in three-dimensional microarray datasets , 2010, 2010 International Computer Symposium (ICS2010).

[27]  Sven Bergmann,et al.  Defining transcription modules using large-scale gene expression data , 2004, Bioinform..

[28]  Demetri Terzopoulos,et al.  Deformable models in medical image analysis: a survey , 1996, Medical Image Anal..

[29]  I. Mechelen,et al.  Two-mode K-spectral centroid analysis for studying multivariate longitudinal profiles , 2016 .

[30]  Alioune Ngom,et al.  Classification of Clinical Gene-Sample-Time Microarray Expression Data via Tensor Decomposition Methods , 2010, CIBB.

[31]  Cristina Rubio-Escudero,et al.  MSL: A Measure to Evaluate Three-dimensional Patterns in Gene Expression Data , 2015, Evolutionary bioinformatics online.

[32]  Kelvin Sim,et al.  Mining Actionable Subspace Clusters in Sequential Data , 2010, SDM.

[33]  Philip S. Yu,et al.  A probabilistic framework for relational clustering , 2007, KDD '07.

[34]  Luigi Pontieri,et al.  Coclustering Multiple Heterogeneous Domains: Linear Combinations and Agreements , 2010, IEEE Transactions on Knowledge and Data Engineering.

[35]  Arlindo L. Oliveira,et al.  Identification of Regulatory Modules in Time Series Gene Expression Data Using a Linear Time Biclustering Algorithm , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[36]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[37]  Menno-Jan Kraak,et al.  Triclustering Georeferenced Time Series for Analyzing Patterns of Intra-Annual Variability in Temperature , 2018 .

[38]  T. Hendler,et al.  Neural traces of stress: cortisol related sustained enhancement of amygdala-hippocampal functional connectivity , 2013, Front. Hum. Neurosci..

[39]  T. Stijnen,et al.  Review: a gentle introduction to imputation of missing values. , 2006, Journal of clinical epidemiology.

[40]  Andreas Hotho,et al.  TRIAS--An Algorithm for Mining Iceberg Tri-Lattices , 2006, Sixth International Conference on Data Mining (ICDM'06).

[41]  Nikos D. Sidiropoulos,et al.  From K-Means to Higher-Way Co-Clustering: Multilinear Decomposition With Sparse Latent Factors , 2013, IEEE Transactions on Signal Processing.

[42]  C. Möller-Levet,et al.  Effects of insufficient sleep on circadian rhythmicity and expression amplitude of the human blood transcriptome , 2013, Proceedings of the National Academy of Sciences.

[43]  Duygu Dede,et al.  TriClust: A Tool for Cross‐Species Analysis of Gene Regulation , 2014, Molecular informatics.

[44]  Fabrice Rossi,et al.  Discovering patterns in time-varying graphs: a triclustering approach , 2015, Advances in Data Analysis and Classification.

[45]  Tie-Yan Liu,et al.  Star-Structured High-Order Heterogeneous Data Co-clustering Based on Consistent Information Theory , 2006, Sixth International Conference on Data Mining (ICDM'06).

[46]  Ran El-Yaniv,et al.  Multi-way distributional clustering via pairwise interactions , 2005, ICML.

[47]  Elke Achtert,et al.  Finding Hierarchies of Subspace Clusters , 2006, PKDD.

[48]  Panos M. Pardalos,et al.  Recent Advances of Data Biclustering with Application in Computational Neuroscience , 2010 .

[49]  Hasan Ogul,et al.  A three-way clustering approach to cross-species gene regulation analysis , 2013, 2013 IEEE INISTA.

[50]  Meng Wang,et al.  Tri-Clustered Tensor Completion for Social-Aware Image Tag Refinement , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51]  Dmitry I. Ignatov,et al.  Turning Krimp into a Triclustering Technique on Sets of Attribute-Condition Pairs that Compress , 2017, IJCRS.

[52]  Philip S. Yu,et al.  GraphScope: parameter-free mining of large time-evolving graphs , 2007, KDD '07.

[53]  David Tuck,et al.  An Effective Tri-Clustering Algorithm Combining Expression Data with Gene Regulation Information , 2009, Gene regulation and systems biology.

[54]  George Michailidis,et al.  Biclustering Three-Dimensional Data Arrays With Plaid Models , 2014 .

[55]  I Ignatov Dmitry,et al.  Frequent Itemset Mining for Clustering Near Duplicate Web Documents , 2009 .

[56]  Jan Schepers,et al.  Three-mode partitioning , 2006, Comput. Stat. Data Anal..

[57]  Dmitry Gnatyshak,et al.  Putting OAC-triclustering on MapReduce , 2015, CLA.

[58]  Ujjwal Maulik,et al.  δ-TRIMAX: Extracting Triclusters and Analysing Coregulation in Time Series Gene Expression Data , 2012, WABI.

[59]  Rui Henriques,et al.  BSig: evaluating the statistical significance of biclustering solutions , 2017, Data Mining and Knowledge Discovery.

[60]  Ron Shamir,et al.  A hierarchical Bayesian model for flexible module discovery in three-way time-series data , 2015, Bioinform..

[61]  Boris G. Mirkin,et al.  Approximate Bicluster and Tricluster Boxes in the Analysis of Binary Data , 2011, RSFDGrC.

[62]  Jian Pei,et al.  Mining coherent gene clusters from gene-sample-time microarray data , 2004, KDD.

[63]  Sergei O. Kuznetsov,et al.  Frequent Itemset Mining for Clustering Near Duplicate Web Documents , 2009, ICCS.

[64]  K. Tan,et al.  Finding Time-Lagged 3D Clusters , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[65]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[66]  Anirban Mukhopadhyay,et al.  Multiobjective triclustering of time-series transcriptome data reveals key genes of biological processes , 2015, BMC Bioinformatics.

[67]  A. Barabasi,et al.  Quantifying social group evolution , 2007, Nature.

[68]  Zhoujun Li,et al.  Multi-objective evolutionary algorithm for mining 3D clusters in gene-sample-time microarray data , 2008, 2008 IEEE International Conference on Granular Computing.

[69]  Alain Trémeau,et al.  A region growing and merging algorithm to color segmentation , 1997, Pattern Recognit..

[70]  Guoren Wang,et al.  Efficiently Mining Time-Delayed Gene Expression Patterns , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[71]  Yu Zong,et al.  Web Co-clustering of Usage Network Using Tensor Decomposition , 2009, 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology.

[72]  Kelvin Sim,et al.  Discovering Correlated Subspace Clusters in 3D Continuous-Valued Data , 2010, 2010 IEEE International Conference on Data Mining.

[73]  Ghim-Eng Yap,et al.  Centroid-Based Actionable 3D Subspace Clustering , 2013, IEEE Transactions on Knowledge and Data Engineering.

[74]  James Bailey,et al.  Mining minimal distinguishing subsequence patterns with gap constraints , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[75]  Arlindo L. Oliveira,et al.  A polynomial time biclustering algorithm for finding approximate expression patterns in gene expression time series , 2009, Algorithms for Molecular Biology.

[76]  Rui Henriques,et al.  Biclustering with Flexible Plaid Models to Unravel Interactions between Biological Processes , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[77]  Marina Meila,et al.  Comparing subspace clusterings , 2006, IEEE Transactions on Knowledge and Data Engineering.

[78]  Tamir Hazan,et al.  Multi-way Clustering Using Super-Symmetric Non-negative Tensor Factorization , 2006, ECCV.

[79]  J. Bobadilla,et al.  Recommender systems survey , 2013, Knowl. Based Syst..

[80]  Yufei Huang,et al.  Enrichment constrained time-dependent clustering analysis for finding meaningful temporal transcription modules , 2009, Bioinform..

[81]  Luigi Pontieri,et al.  An Information-Theoretic Framework for High-Order Co-Clustering of Heterogeneous Objects , 2007, SEBD.

[82]  Tommi S. Jaakkola,et al.  Automated Discovery of Functional Generality of Human Gene Expression Programs , 2007, PLoS Comput. Biol..

[83]  Richard M. Karp,et al.  Discovering local structure in gene expression data: the order-preserving submatrix problem , 2002, RECOMB '02.

[84]  Jie Yan,et al.  Leptospiral Hemolysins Induce Proinflammatory Cytokines through Toll-Like Receptor 2-and 4-Mediated JNK and NF-κB Signaling Pathways , 2012, PloS one.

[85]  Fei Wang,et al.  From micro to macro: data driven phenotyping by densification of longitudinal electronic medical records , 2014, KDD.

[86]  Irfan A. Essa,et al.  Detecting Subdimensional Motifs: An Efficient Algorithm for Generalized Multivariate Pattern Discovery , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[87]  Kyuwan Choi,et al.  Detecting the Number of Clusters in n-Way Probabilistic Clustering , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[88]  Arindam Banerjee,et al.  Multi-way Clustering on Relation Graphs , 2007, SDM.

[89]  Sergei O. Kuznetsov,et al.  Triadic Formal Concept Analysis and triclustering: searching for optimal patterns , 2015, Machine Learning.

[90]  Jianfei Cai,et al.  Beyond pixels: A comprehensive survey from bottom-up to semantic image segmentation and cosegmentation , 2015, J. Vis. Commun. Image Represent..

[91]  Joana P. Gonçalves,et al.  LateBiclustering: Efficient Heuristic Algorithm for Time-Lagged Bicluster Identification , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[92]  Majid Sarrafzadeh,et al.  Toward Unsupervised Activity Discovery Using Multi-Dimensional Motif Detection in Time Series , 2009, IJCAI.

[93]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[94]  Yi Huang,et al.  Mining biological information from 3D short time-series gene expression data: the OPTricluster algorithm , 2012, BMC Bioinformatics.

[95]  L. Lazzeroni Plaid models for gene expression data , 2000 .

[96]  Dhruba K. Bhattacharyya,et al.  A Fast Gene Expression Analysis using Parallel Biclustering and Distributed Triclustering Approach , 2016, ICTCS.

[97]  Ümit V. Çatalyürek,et al.  Comparative analysis of biclustering algorithms , 2010, BCB '10.

[98]  Cristina Rubio-Escudero,et al.  Mining 3D Patterns from Gene Expression Temporal Data: A New Tricluster Evaluation Measure , 2014, TheScientificWorldJournal.

[99]  D. V. Gnatyshak A single-pass triclustering algorithm , 2015, Automatic Documentation and Mathematical Linguistics.

[100]  Menno-Jan Kraak,et al.  CLUSTERING-BASED APPROACHES TOTHE EXPLORATION OF SPATIO-TEMPORAL DATA , 2017 .

[101]  Shu Wang,et al.  Biclustering as a method for RNA local multiple sequence alignment , 2007, Bioinform..

[102]  Lothar Thiele,et al.  A systematic comparison and evaluation of biclustering methods for gene expression data , 2006, Bioinform..

[103]  Jugal K. Kalita,et al.  Triclustering in gene expression data analysis: A selected survey , 2011, 2011 2nd National Conference on Emerging Trends and Applications in Computer Science.

[104]  Philip S. Yu,et al.  Spectral clustering for multi-type relational data , 2006, ICML.

[105]  Jimeng Sun,et al.  MetaFac: community discovery via relational hypergraph factorization , 2009, KDD.

[106]  Zhen Hu,et al.  Algorithm for Discovering Low-Variance 3-Clusters from Real-Valued Datasets , 2010, 2010 IEEE International Conference on Data Mining.

[107]  Jean-François Boulicaut,et al.  Data Peeler: Contraint-Based Closed Pattern Mining in n-ary Relations , 2008, SDM.

[108]  José Cristóbal Riquelme Santos,et al.  TriGen: A genetic algorithm to mine triclusters in temporal gene expression data , 2014, Neurocomputing.

[109]  Raj Bhatnagar,et al.  Discovery of Temporal Dependencies between Frequent Patterns in Multivariate Time Series , 2007, 2007 IEEE Symposium on Computational Intelligence and Data Mining.

[110]  Haifeng Li,et al.  Integrative Analysis of Many Weighted Co-Expression Networks Using Tensor Computation , 2011, PLoS Comput. Biol..

[111]  Bart Selman,et al.  Tracking evolving communities in large linked networks , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[112]  Rui Henriques,et al.  BicNET: Flexible module discovery in large-scale biological networks using biclustering , 2016, Algorithms for Molecular Biology.

[113]  Cláudia Antunes,et al.  A structured view on pattern mining-based biclustering , 2015, Pattern Recognit..

[114]  Ira Assent,et al.  Pleiades: Subspace Clustering and Evaluation , 2008, ECML/PKDD.

[115]  Andreas Zell,et al.  EDISA: extracting biclusters from multiple time-series of gene expression profiles , 2007, BMC Bioinformatics.

[116]  Jonas Poelmans,et al.  Gaining Insight in Social Networks with Biclustering and Triclustering , 2012, BIR.

[117]  M. Steinbach,et al.  High-Order SNP Combinations Associated with Complex Diseases: Efficient Discovery, Statistical Power and Functional Interactions , 2012, PloS one.

[118]  Cristina Rubio-Escudero,et al.  LSL: A new measure to evaluate triclusters , 2014, 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[119]  Cristina Rubio-Escudero,et al.  TRIQ: A Comprehensive Evaluation Measure for Triclustering Algorithms , 2016, HAIS.

[120]  J. K. Kalita,et al.  Intersected coexpressed subcube miner: An effective triclustering algorithm , 2011, 2011 World Congress on Information and Communication Technologies.