Discovery of bidirectional contiguous column coherent bicluster in time-series gene expression data

AbstractThe application of high-throughput microarray has led to massive gene expression data, urging effective methodology for analysis. Biclustering comes out and serves as a useful tool, performing simultaneous clustering on rows and columns to find subsets of coherently expressed genes and conditions. Specially, in analysis of time–series gene expression data, it is meaningful to restrict biclusters to contiguous time points concerning coherent evolutions. In this paper, BCCC-Bicluster is proposed as an extension of CCC-Bicluster. An exact algorithm based on frequent sequential mining is proposed to find all maximal BCCC-Biclusters. The newly defined Frequent-Infrequent Tree-Array (FITA) is constructed to speed up the traversal process, with useful strategies originating from Apriori property to avoid redundant work. To make it more efficient, the bitwise operation XOR is applied to capture identical or opposite contiguous patterns between two rows. The algorithm is tested in simulated data, yeast microarray data and human microarray data. The experimental results show the proposed algorithm had better performance on the ability to recover the planted biclusters in the synthetic data than CCC-Biclusters and outperformed the one without FITA in speed and scalability. In the enrichment analysis, BCCC-Biclusters are proven to find more significant GO terms involved in biological processes than other three kinds of up-to-date biclusters.

[1]  Philip S. Yu,et al.  Enhanced biclustering on expression data , 2003, Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings..

[2]  L. Lazzeroni Plaid models for gene expression data , 2000 .

[3]  K L Hall,et al.  100-Gbit/s bitwise logic. , 1998, Optics letters.

[4]  René Peeters,et al.  The maximum edge biclique problem is NP-complete , 2003, Discret. Appl. Math..

[5]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[6]  Eckart Zitzler,et al.  BicAT: a biclustering analysis toolbox , 2006, Bioinform..

[7]  Xu Zhou,et al.  Effective algorithms of the Moore-Penrose inverse matrices for extreme learning machine , 2015, Intell. Data Anal..

[8]  Arlindo L. Oliveira,et al.  Identification of Regulatory Modules in Time Series Gene Expression Data Using a Linear Time Biclustering Algorithm , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[9]  Krasimir Yankov Yordzhev An Example for the Use of Bitwise Operations in Programming , 2012, ArXiv.

[10]  Arlindo L. Oliveira,et al.  An Efficient Biclustering Algorithm for Finding Genes with Similar Patterns in Time-series Expression Data , 2007, APBC.

[11]  Roded Sharan,et al.  Discovering statistically significant biclusters in gene expression data , 2002, ISMB.

[12]  Srinivas Aluru Handbook of Computational Molecular Biology (Chapman & All/Crc Computer and Information Science Series) , 2005 .

[13]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[14]  T. M. Murali,et al.  Extracting Conserved Gene Expression Motifs from Gene Expression Data , 2002, Pacific Symposium on Biocomputing.

[15]  G. Getz,et al.  Coupled two-way clustering analysis of gene microarray data. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Joana P Gonçalves,et al.  BiGGEsTS: integrated environment for biclustering analysis of time series gene expression data , 2009, BMC Research Notes.

[17]  Richard M. Karp,et al.  Discovering local structure in gene expression data: the order-preserving submatrix problem , 2002, RECOMB '02.

[18]  Hong Yan,et al.  A New Strategy of Geometrical Biclustering for Microarray Data Analysis , 2007, APBC.

[19]  Hong Yan,et al.  Biclustering gene expression data based on a high dimensional geometric method , 2005, 2005 International Conference on Machine Learning and Cybernetics.

[20]  Ya Zhang,et al.  A time-series biclustering algorithm for revealing co-regulated genes , 2005, International Conference on Information Technology: Coding and Computing (ITCC'05) - Volume II.

[21]  Xizhao Wang,et al.  Segment Based Decision Tree Induction With Continuous Valued Attributes , 2015, IEEE Transactions on Cybernetics.

[22]  David Martin,et al.  GOToolBox: functional analysis of gene datasets based on Gene Ontology , 2004, Genome Biology.

[23]  Arlindo L. Oliveira,et al.  A Linear Time Biclustering Algorithm for Time Series Gene Expression Data , 2005, WABI.

[24]  Xizhao Wang,et al.  Learning from big data with uncertainty - editorial , 2015, J. Intell. Fuzzy Syst..

[25]  Xizhao Wang,et al.  Fuzziness based sample categorization for classifier performance improvement , 2015, J. Intell. Fuzzy Syst..

[26]  Beng Chin Ooi,et al.  Efficient Progressive Skyline Computation , 2001, VLDB.

[27]  Kian-Lee Tan,et al.  Identifying time-lagged gene clusters using gene expression data , 2005, Bioinform..

[28]  Hui Xiong,et al.  On the Deep Order-Preserving Submatrix Problem: A Best Effort Approach , 2012, IEEE Transactions on Knowledge and Data Engineering.

[29]  Arlindo L. Oliveira,et al.  A polynomial time biclustering algorithm for finding approximate expression patterns in gene expression time series , 2009, Algorithms for Molecular Biology.

[30]  Jun S Liu,et al.  Bayesian biclustering of gene expression data , 2008, BMC Genomics.

[31]  Kevin Y Yip,et al.  On mining micro-array data by Order-Preserving Submatrix. , 2007, International journal of bioinformatics research and applications.

[32]  Eckart Zitzler,et al.  An EA framework for biclustering of gene expression data , 2004, Proceedings of the 2004 Congress on Evolutionary Computation (IEEE Cat. No.04TH8753).

[33]  Jinze Liu,et al.  Biclustering in gene expression data by tendency , 2004 .

[34]  D. Gottesman Theory of fault-tolerant quantum computation , 1997, quant-ph/9702029.

[35]  Lothar Thiele,et al.  A systematic comparison and evaluation of biclustering methods for gene expression data , 2006, Bioinform..

[36]  Yong Wang,et al.  Detecting coherent local patterns from time series gene expression data by a temporal biclustering method , 2011, 2011 IEEE International Conference on Systems Biology (ISB).

[37]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[38]  Robert R. Sokal,et al.  A statistical method for evaluating systematic relationships , 1958 .

[39]  Bart De Moor,et al.  Biclustering microarray data by Gibbs sampling , 2003, ECCB.

[40]  Federico Divina,et al.  A multi-objective approach to discover biclusters in microarray data , 2007, GECCO '07.

[41]  Witold Pedrycz,et al.  A Study on Relationship Between Generalization Abilities and Fuzziness of Base Classifiers in Ensemble Learning , 2015, IEEE Transactions on Fuzzy Systems.