Pairwise gene GO-based measures for biclustering of high-dimensional expression data

BackgroundBiclustering algorithms search for groups of genes that share the same behavior under a subset of samples in gene expression data. Nowadays, the biological knowledge available in public repositories can be used to drive these algorithms to find biclusters composed of groups of genes functionally coherent. On the other hand, a distance among genes can be defined according to their information stored in Gene Ontology (GO). Gene pairwise GO semantic similarity measures report a value for each pair of genes which establishes their functional similarity. A scatter search-based algorithm that optimizes a merit function that integrates GO information is studied in this paper. This merit function uses a term that addresses the information through a GO measure.ResultsThe effect of two possible different gene pairwise GO measures on the performance of the algorithm is analyzed. Firstly, three well known yeast datasets with approximately one thousand of genes are studied. Secondly, a group of human datasets related to clinical data of cancer is also explored by the algorithm. Most of these data are high-dimensional datasets composed of a huge number of genes. The resultant biclusters reveal groups of genes linked by a same functionality when the search procedure is driven by one of the proposed GO measures. Furthermore, a qualitative biological study of a group of biclusters show their relevance from a cancer disease perspective.ConclusionsIt can be concluded that the integration of biological information improves the performance of the biclustering process. The two different GO measures studied show an improvement in the results obtained for the yeast dataset. However, if datasets are composed of a huge number of genes, only one of them really improves the algorithm performance. This second case constitutes a clear option to explore interesting datasets from a clinical point of view.

[1]  Katharina J. Hoff,et al.  Orphelia: predicting genes in metagenomic sequencing reads , 2009, Nucleic Acids Res..

[2]  Takashi Yoneya,et al.  TCP: a tool for designing chimera proteins based on the tertiary structure information , 2009, BMC Bioinformatics.

[3]  J. Morgan,et al.  Problems in the Analysis of Survey Data, and a Proposal , 1963 .

[4]  Catia Pesquita,et al.  Metrics for GO based protein semantic similarity: a systematic evaluation , 2008, BMC Bioinformatics.

[5]  F. Wagner GO-PCA: An Unsupervised Method to Explore Gene Expression Data Using Prior Knowledge , 2015, PloS one.

[6]  T. M. Murali,et al.  Extracting Conserved Gene Expression Motifs from Gene Expression Data , 2002, Pacific Symposium on Biocomputing.

[7]  Jesús S. Aguilar-Ruiz,et al.  Shifting and scaling patterns from gene expression data , 2005, Bioinform..

[8]  Juan A. Nepomuceno,et al.  Integrating biological knowledge based on functional annotations for biclustering of gene expression data , 2015, Comput. Methods Programs Biomed..

[9]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[10]  P. Nelson,et al.  Theory of high-force DNA stretching and overstretching. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[11]  Mehmet Deveci,et al.  A comparative analysis of biclustering algorithms for gene expression data , 2013, Briefings Bioinform..

[12]  Ying Xu,et al.  QUBIC: a qualitative biclustering algorithm for analyses of gene expression data , 2009, Nucleic acids research.

[13]  Joseph T. Chang,et al.  Spectral biclustering of microarray data: coclustering genes and conditions. , 2003, Genome research.

[14]  Joaquín Dopazo,et al.  Babelomics: an integrative platform for the analysis of transcriptomics, proteomics and genomic data with advanced functional profiling , 2010, Nucleic Acids Res..

[15]  Rachael P. Huntley,et al.  QuickGO: a web-based tool for Gene Ontology searching , 2009, Bioinform..

[16]  Federico Divina,et al.  A multi-objective approach to discover biclusters in microarray data , 2007, GECCO '07.

[17]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[18]  Chris Sander,et al.  Characterizing gene sets with FuncAssociate , 2003, Bioinform..

[19]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[20]  Lincoln Stein,et al.  Reactome pathway analysis to enrich biological discovery in proteomics data sets , 2011, Proteomics.

[21]  Michelangelo Ceci,et al.  A Novel Biclustering Algorithm for the Discovery of Meaningful Biological Correlations between microRNAs and their Target Genes , 2013, BMC Bioinformatics.

[22]  Sushmita Mitra,et al.  Evolutionary biclustering of gene expressions , 2006, UBIQ.

[23]  Juan A. Nepomuceno,et al.  Biclustering of Gene Expression Data by Correlation-Based Scatter Search , 2011, BioData Mining.

[24]  Anindya Bhattacharya,et al.  Bi-correlation clustering algorithm for determining a set of co-regulated genes , 2009, Bioinform..

[25]  Roded Sharan,et al.  Biclustering Algorithms: A Survey , 2007 .

[26]  Paul Strauss,et al.  Genome Stability And Human Diseases , 2016 .

[27]  Ricardo J. G. B. Campello,et al.  A systematic comparative evaluation of biclustering techniques , 2017, BMC Bioinformatics.

[28]  Hong Yan,et al.  Finding Correlated Biclusters from Gene Expression Data , 2011, IEEE Transactions on Knowledge and Data Engineering.

[29]  Richard M. Karp,et al.  Discovering local structure in gene expression data: the order-preserving submatrix problem , 2002, RECOMB '02.

[30]  Juan A. Nepomuceno,et al.  An Overlapping Control–Biclustering Algorithm from Gene Expression Data , 2009, 2009 Ninth International Conference on Intelligent Systems Design and Applications.

[31]  Gwan-Su Yi,et al.  Biclustering for the comprehensive search of correlated gene expression patterns using clustered seed expansion , 2013, BMC Genomics.

[32]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[33]  Jinyan Li,et al.  Maximization of negative correlations in time-course gene expression data for enhancing understanding of molecular pathways , 2009, Nucleic acids research.

[34]  Francesca D. Ciccarelli,et al.  NCG 5.0: updates of a manually curated repository of cancer genes and associated properties from cancer mutational screenings , 2015, Nucleic Acids Res..

[35]  Shusaku Tsumoto,et al.  Mining Rules for Risk Factors on Blood Stream Infection in Hospital Information System , 2007, 2007 IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2007).

[36]  Derek Y. Chiang,et al.  MapSplice: Accurate mapping of RNA-seq reads for splice junction discovery , 2010, Nucleic acids research.

[37]  Francisco Azuaje,et al.  Bioinformatics and biomarker discovery : "omic" data analysis for personalised medicine , 2010 .

[38]  Rui Henriques,et al.  BiC2PAM: constraint-guided biclustering for biological data analysis with domain knowledge , 2016, Algorithms for Molecular Biology.

[39]  Jesús S. Aguilar-Ruiz,et al.  Biclustering on expression data: A review , 2015, J. Biomed. Informatics.

[40]  Sven Bergmann,et al.  Iterative signature algorithm for the analysis of large-scale gene expression data. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[41]  Adetayo Kasim,et al.  Applied Biclustering Methods for Big and High-Dimensional Data Using R , 2016 .

[42]  Ruggero G. Pensa,et al.  Leveraging additional knowledge to support coherent bicluster discovery in gene expression data , 2014, Intell. Data Anal..

[43]  Philip S. Yu,et al.  An Improved Biclustering Method for Analyzing Gene Expression Profiles , 2005, Int. J. Artif. Intell. Tools.

[44]  Panos M. Pardalos,et al.  Biclustering in data mining , 2008, Comput. Oper. Res..

[45]  D. Altman,et al.  Multiple significance tests: the Bonferroni method , 1995, BMJ.

[46]  Ricardo J. G. B. Campello,et al.  Proximity Measures for Clustering Gene Expression Microarray Data: A Validation Methodology and a Comparative Analysis , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[47]  Lothar Thiele,et al.  A systematic comparison and evaluation of biclustering methods for gene expression data , 2006, Bioinform..

[48]  Mark Woodbridge,et al.  XperimentR: painless annotation of a biological experiment for the laboratory scientist , 2013, BMC Bioinformatics.

[49]  Zhoujun Li,et al.  Biclustering of microarray data with MOSPO based on crowding distance , 2009, BMC Bioinformatics.

[50]  Pedro Larrañaga,et al.  A new measure for gene expression biclustering based on non-parametric correlation , 2013, Comput. Methods Programs Biomed..

[51]  Juan Cui Genomic Data Analysis for Personalized Medicine , 2015 .

[52]  Knut Reinert,et al.  Robust consensus computation , 2008, BMC Bioinformatics.

[53]  Francisco Azuaje Bioinformatics and Biomarker Discovery , 2010 .

[54]  Juan A. Nepomuceno,et al.  Biclustering of Gene Expression Data Based on SimUI Semantic Similarity Measure , 2016, HAIS.

[55]  Juan A. Nepomuceno,et al.  Scatter search-based identification of local patterns with positive and negative correlations in gene expression data , 2015, Appl. Soft Comput..

[56]  Jin-Kao Hao,et al.  A biclustering algorithm based on a Bicluster Enumeration Tree: application to DNA microarray data , 2009, BioData Mining.

[57]  L. Lazzeroni Plaid models for gene expression data , 2000 .

[58]  Jessica Andrea Carballido,et al.  Microarray Biclustering: A Novel Memetic Approach Based on the PISA Platform , 2009, EvoBIO.

[59]  Rafael Martí,et al.  Scatter Search: Diseño Básico y Estrategias avanzadas , 2002, Inteligencia Artif..

[60]  Ricardo Martínez,et al.  GenMiner: Mining Informative Association Rules from Genomic Data , 2007, 2007 IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2007).

[61]  Carsten Wiuf,et al.  Co-clustering and visualization of gene expression data and gene ontology terms for Saccharomyces cerevisiae using self-organizing maps , 2007, J. Biomed. Informatics.

[62]  Valentin Wagner,et al.  Towards a Psychological Construct of Being Moved , 2015, PloS one.

[63]  Ulrich Bodenhofer,et al.  FABIA: factor analysis for bicluster acquisition , 2010, Bioinform..

[64]  Edward W. J. Curry A framework for generalized subspace pattern mining in high-dimensional datasets , 2014, BMC Bioinformatics.

[65]  Sébastien Lê,et al.  A new unsupervised gene clustering algorithm based on the integration of biological knowledge into expression data , 2013, BMC Bioinformatics.