Biclustering with Flexible Plaid Models to Unravel Interactions between Biological Processes

Genes can participate in multiple biological processes at a time and thus their expression can be seen as a composition of the contributions from the active processes. Biclustering under a plaid assumption allows the modeling of interactions between transcriptional modules or biclusters (subsets of genes with coherence across subsets of conditions) by assuming an additive composition of contributions in their overlapping areas. Despite the biological interest of plaid models, few biclustering algorithms consider plaid effects and, when they do, they place restrictions on the allowed types and structures of biclusters, and suffer from robustness problems by seizing exact additive matchings. We propose BiP (Biclustering using Plaid models), a biclustering algorithm with relaxations to allow expression levels to change in overlapping areas according to biologically meaningful assumptions (weighted and noise-tolerant composition of contributions). BiP can be used over existing biclustering solutions (seizing their benefits) as it is able to recover excluded areas due to unaccounted plaid effects and detect noisy areas non-explained by a plaid assumption, thus producing an explanatory model of overlapping transcriptional activity. Experiments on synthetic data support BiP's efficiency and effectiveness. The learned models from expression data unravel meaningful and non-trivial functional interactions between biological processes associated with putative regulatory modules.

[1]  David Martin,et al.  GOToolBox: functional analysis of gene datasets based on Gene Ontology , 2004, Genome Biology.

[2]  Ben Taskar,et al.  Rich probabilistic models for gene expression , 2001, ISMB.

[3]  Vipin Kumar,et al.  An association analysis approach to biclustering , 2009, KDD.

[4]  Martin Vingron,et al.  DeBi: Discovering Differentially Expressed Biclusters using a Frequent Itemset Approach , 2011, Algorithms for Molecular Biology.

[5]  Mauro Brunato,et al.  Discovering Non-redundant Overlapping Biclusters on Gene Expression Data , 2013, 2013 IEEE 13th International Conference on Data Mining.

[6]  Gowtham Atluri,et al.  Putting genetic interactions in context through a global modular decomposition. , 2011, Genome research.

[7]  Catarina Costa,et al.  The YEASTRACT database: an upgraded information system for the analysis of gene and genomic transcription regulation in Saccharomyces cerevisiae , 2013, Nucleic Acids Res..

[8]  Marina Meila,et al.  Comparing subspace clusterings , 2006, IEEE Transactions on Knowledge and Data Engineering.

[9]  Richard M. Karp,et al.  Discovering local structure in gene expression data: the order-preserving submatrix problem , 2002, RECOMB '02.

[10]  Paul Horton,et al.  A biclustering method for gene expression module discovery using a closed itemset enumeration algorithm , 2007 .

[11]  Ümit V. Çatalyürek,et al.  Comparative analysis of biclustering algorithms , 2010, BCB '10.

[12]  Rui Henriques,et al.  BicPAM: Pattern-based biclustering for biomedical data analysis , 2014, Algorithms for Molecular Biology.

[13]  S. S. Young,et al.  Resampling-Based Multiple Testing: Examples and Methods for p-Value Adjustment , 1993 .

[14]  Lothar Thiele,et al.  A systematic comparison and evaluation of biclustering methods for gene expression data , 2006, Bioinform..

[15]  Jesús S. Aguilar-Ruiz,et al.  Gene association analysis: a survey of frequent pattern mining from gene expression data , 2010, Briefings Bioinform..

[16]  Cláudia Antunes,et al.  Methods for the Efficient Discovery of Large Item-Indexable Sequential Patterns , 2013, NFMCP.

[17]  L. Lazzeroni Plaid models for gene expression data , 2000 .

[18]  Andrea Califano,et al.  Analysis of Gene Expression Microarrays for Phenotype Classification , 2000, ISMB.

[19]  Vipin Kumar,et al.  Discovery of error-tolerant biclusters from noisy gene expression data , 2011, BMC Bioinformatics.

[20]  Wojtek J. Krzanowski,et al.  Improved biclustering of microarray data demonstrated through systematic performance tests , 2005, Comput. Stat. Data Anal..

[21]  G. Getz,et al.  Coupled two-way clustering analysis of gene microarray data. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[22]  S. Kaski,et al.  Bayesian biclustering with the plaid model , 2008, 2008 IEEE Workshop on Machine Learning for Signal Processing.

[23]  Ricardo Martínez,et al.  GenMiner: Mining Informative Association Rules from Genomic Data , 2007, 2007 IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2007).

[24]  Ziv Bar-Joseph,et al.  Analyzing time series gene expression data , 2004, Bioinform..

[25]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[26]  Chris Sander,et al.  Characterizing gene sets with FuncAssociate , 2003, Bioinform..

[27]  Jun S Liu,et al.  Bayesian biclustering of gene expression data , 2008, BMC Genomics.

[28]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[29]  Hui Xiong,et al.  Generalizing the notion of support , 2004, KDD.

[30]  Jiawei Han,et al.  Frequent pattern mining: current status and future directions , 2007, Data Mining and Knowledge Discovery.

[31]  Eckart Zitzler,et al.  BicAT: a biclustering analysis toolbox , 2006, Bioinform..

[32]  Arlindo L. Oliveira,et al.  Identification of Regulatory Modules in Time Series Gene Expression Data Using a Linear Time Biclustering Algorithm , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[33]  Patryk Orzechowski,et al.  Proximity Measures and Results Validation in Biclustering - A Survey , 2013, ICAISC.

[34]  Amiya Kumar Rath,et al.  Discovering non-exclusive functional modules from gene expression data , 2011, Int. J. Inf. Commun. Technol..

[35]  J. Hartigan Direct Clustering of a Data Matrix , 1972 .

[36]  Gowtham Atluri,et al.  Discovering coherent value bicliques in genetic interaction data , 2010, KDD 2010.

[37]  Jin-Kao Hao,et al.  Survey on Biclustering of Gene Expression Data , 2013 .

[38]  T. M. Murali,et al.  Extracting Conserved Gene Expression Motifs from Gene Expression Data , 2002, Pacific Symposium on Biocomputing.

[39]  Rui Henriques,et al.  BicSPAM: flexible biclustering using sequential patterns , 2014, BMC Bioinformatics.

[40]  Mohamed A. Ismail,et al.  Soft Flexible Overlapping Biclustering Utilizing Hybrid Search Strategies , 2012, AMLTA.

[41]  D. Botstein,et al.  Genomic expression programs in the response of yeast cells to environmental changes. , 2000, Molecular biology of the cell.

[42]  Tom F. Wilderjans,et al.  Additive Biclustering: A Comparison of One New and Two Existing ALS Algorithms , 2013, Journal of Classification.

[43]  Zoubin Ghahramani,et al.  Modeling Dyadic Data with Binary Latent Factors , 2006, NIPS.

[44]  Roded Sharan,et al.  Discovering statistically significant biclusters in gene expression data , 2002, ISMB.

[45]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[46]  Cláudia Antunes,et al.  F2G: Efficient Discovery of Full-Patterns , 2013 .

[47]  George Michailidis,et al.  Biclustering Three-Dimensional Data Arrays With Plaid Models , 2014 .

[48]  Shusaku Tsumoto,et al.  Mining Rules for Risk Factors on Blood Stream Infection in Hospital Information System , 2007, 2007 IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2007).

[49]  Jesús S. Aguilar-Ruiz,et al.  Configurable pattern-based evolutionary biclustering of gene expression data , 2012, Algorithms for Molecular Biology.

[50]  Daphne Koller,et al.  Decomposing Gene Expression into Cellular Processes , 2002, Pacific Symposium on Biocomputing.

[51]  Israel Steinfeld,et al.  BMC Bioinformatics BioMed Central , 2008 .

[52]  Sven Bergmann,et al.  Defining transcription modules using large-scale gene expression data , 2004, Bioinform..

[53]  Hui Xiong,et al.  Mining Quantitative Maximal Hyperclique Patterns: A Summary of Results , 2006, PAKDD.

[54]  Ulrich Bodenhofer,et al.  FABIA: factor analysis for bicluster acquisition , 2010, Bioinform..

[55]  Mehmet Deveci,et al.  A comparative analysis of biclustering algorithms for gene expression data , 2013, Briefings Bioinform..