Using Formal Concept Analysis to Identify Negative Correlations in Gene Expression Data

Recently, many biological studies reported that two groups of genes tend to show negatively correlated or opposite expression tendency in many biological processes or pathways. The negative correlation between genes may imply an important biological mechanism. In this study, we proposed a FCA-based negative correlation algorithm (NCFCA) that can effectively identify opposite expression tendency between two gene groups in gene expression data. After applying it to expression data of cell cycle-regulated genes in yeast, we found that six minichromosome maintenance family genes showed the opposite changing tendency with eight core histone family genes. Furthermore, we confirmed that the negative correlation expression pattern between these two families may be conserved in the cell cycle. Finally, we discussed the reasons underlying the negative correlation of six minichromosome maintenance (MCM) family genes with eight core histone family genes. Our results revealed that negative correlation is an important and potential mechanism that maintains the balance of biological systems by repressing some genes while inducing others. It can thus provide new understanding of gene expression and regulation, the causes of diseases, etc.

[1]  L. Lazzeroni Plaid models for gene expression data , 2000 .

[2]  Bernhard Ganter,et al.  Formal Concept Analysis: Mathematical Foundations , 1998 .

[3]  G. N. Lance,et al.  A General Theory of Classificatory Sorting Strategies: 1. Hierarchical Systems , 1967, Comput. J..

[4]  Srinivasan Parthasarathy,et al.  An ensemble framework for clustering protein-protein interaction networks , 2007, ISMB/ECCB.

[5]  Rudolf Wille,et al.  Restructuring Lattice Theory: An Approach Based on Hierarchies of Concepts , 2009, ICFCA.

[6]  Jean-François Boulicaut,et al.  Mining a New Fault-Tolerant Pattern Type as an Alternative to Formal Concept Discovery , 2006, ICCS.

[7]  Jean-François Boulicaut,et al.  Constraint-based concept mining and its application to microarray data analysis , 2005, Intell. Data Anal..

[8]  Ron Shamir,et al.  CLICK and EXPANDER: a system for clustering and visualizing gene expression data , 2003, Bioinform..

[9]  D. Botstein,et al.  Genomic expression programs in the response of yeast cells to environmental changes. , 2000, Molecular biology of the cell.

[10]  Mohd Saberi Mohamad,et al.  A Review on Missing Value Imputation Algorithms for Microarray Gene Expression Data , 2014 .

[11]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[12]  D. Clark,et al.  Spt10 and Swi4 Control the Timing of Histone H2A/H2B Gene Activation in Budding Yeast , 2010, Molecular and Cellular Biology.

[13]  Partha S. Vasisht Computational Analysis of Microarray Data , 2003 .

[14]  Arlindo L. Oliveira,et al.  A polynomial time biclustering algorithm for finding approximate expression patterns in gene expression time series , 2009, Algorithms for Molecular Biology.

[15]  Vilém Vychodil,et al.  Fast algorithm for computing fixpoints of Galois connections induced by object-attribute relational data , 2012, Inf. Sci..

[16]  Jinyan Li,et al.  Negative correlations in collaboration: concepts and algorithms , 2010, KDD '10.

[17]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[18]  Amedeo Napoli,et al.  Biclustering Numerical Data in Formal Concept Analysis , 2011, ICFCA.

[19]  Sergio Contrino,et al.  ArrayExpress—a public repository for microarray gene expression data at the EBI , 2004, Nucleic Acids Res..

[20]  Z. Bar-Joseph,et al.  Budding yeast SSD1-V regulates transcript levels of many longevity genes and extends chronological life span in purified quiescent cells. , 2009, Molecular biology of the cell.

[21]  Juan A. Nepomuceno,et al.  A local search in Scatter Search for improving Biclusters , 2011, 2011 Third World Congress on Nature and Biologically Inspired Computing.

[22]  D. Clark,et al.  Regulation of Histone Gene Expression in Budding Yeast , 2012, Genetics.

[23]  L. Breeden,et al.  Conserved homeodomain proteins interact with MADS box protein Mcm1 to restrict ECB-dependent transcription to the M/G1 phase of the cell cycle. , 2002, Genes & development.

[24]  Matthew E Ritchie,et al.  High-resolution transcription atlas of the mitotic cell cycle in budding yeast , 2010, Genome Biology.

[25]  Michael P. Snyder,et al.  RNA‐Seq: A Method for Comprehensive Transcriptome Analysis , 2010, Current protocols in molecular biology.

[26]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[27]  Joana P Gonçalves,et al.  BiGGEsTS: integrated environment for biclustering analysis of time series gene expression data , 2009, BMC Research Notes.

[28]  N. Friedman,et al.  Cell Cycle– and Chaperone-Mediated Regulation of H3K56ac Incorporation in Yeast , 2008, PLoS genetics.

[29]  N. Friedman,et al.  Replication and Active Demethylation Represent Partially Overlapping Mechanisms for Erasure of H3K4me3 in Budding Yeast , 2010, PLoS genetics.

[30]  J. Becker,et al.  A dynamic balance between gene activation and repression regulates the shade avoidance response in Arabidopsis. , 2005, Genes & development.

[31]  Amedeo Napoli,et al.  Mining gene expression data with pattern structures in formal concept analysis , 2011, Inf. Sci..

[32]  Edith D. Wong,et al.  Saccharomyces Genome Database: the genomics resource of budding yeast , 2011, Nucleic Acids Res..

[33]  Ying Xu,et al.  QUBIC: a qualitative biclustering algorithm for analyses of gene expression data , 2009, Nucleic acids research.

[34]  D. Figeys,et al.  Restriction of histone gene transcription to S phase by phosphorylation of a chromatin boundary protein. , 2011, Genes & development.

[35]  Peer Bork,et al.  Comparison of computational methods for the identification of cell cycle-regulated genes , 2005, Bioinform..

[36]  Jinyan Li,et al.  Maximization of negative correlations in time-course gene expression data for enhancing understanding of molecular pathways , 2009, Nucleic acids research.

[37]  János Abonyi,et al.  Biclustering of High-throughput Gene Expression Data with Bicluster Miner , 2012, 2012 IEEE 12th International Conference on Data Mining Workshops.

[38]  Amedeo Napoli,et al.  Biclustering meets triadic concept analysis , 2013, Annals of Mathematics and Artificial Intelligence.

[39]  Sergei O. Kuznetsov,et al.  Learning of Simple Conceptual Graphs from Positive and Negative Examples , 1999, PKDD.

[40]  Lothar Thiele,et al.  A systematic comparison and evaluation of biclustering methods for gene expression data , 2006, Bioinform..

[41]  Simon Andrews In-Close2, a High Performance Formal Concept Miner , 2011, ICCS.

[42]  Vilém Vychodil,et al.  Advances in Algorithms Based on CbO , 2010, CLA.

[43]  Gerd Stumme,et al.  Efficient Data Mining Based on Formal Concept Analysis , 2002, DEXA.

[44]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[45]  Siu Cheung Hui,et al.  A Formal Concept Analysis Approach for Web Usage Mining , 2004, Intelligent Information Processing.

[46]  Sergei O. Kuznetsov,et al.  Mathematical aspects of concept analysis , 1996 .

[47]  Arlindo L. Oliveira,et al.  Identification of Regulatory Modules in Time Series Gene Expression Data Using a Linear Time Biclustering Algorithm , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[48]  T. M. Murali,et al.  Extracting Conserved Gene Expression Motifs from Gene Expression Data , 2002, Pacific Symposium on Biocomputing.

[49]  D. Botstein,et al.  Two yeast forkhead genes regulate the cell cycle and pseudohyphal growth , 2000, Nature.

[50]  William Stafford Noble,et al.  The Forkhead transcription factor Hcm1 regulates chromosome segregation genes and fills the S-phase gap in the transcriptional circuitry of the cell cycle. , 2006, Genes & development.

[51]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[52]  Youyong Zhu,et al.  Genetic diversity and disease control in rice , 2000, Nature.

[53]  Simon Andrews,et al.  A 'Best-of-Breed' approach for designing a fast algorithm for computing fixpoints of Galois Connections , 2015, Inf. Sci..

[54]  Roded Sharan,et al.  Revealing modularity and organization in the yeast molecular network by integrated analysis of highly heterogeneous genomewide data. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[55]  P. Prochasson,et al.  A global requirement for the HIR complex in the assembly of chromatin. , 2012, Biochimica et biophysica acta.

[56]  Jonas Poelmans,et al.  A Case of Using Formal Concept Analysis in Combination with Emergent Self Organizing Maps for Detecting Domestic Violence , 2009, ICDM.

[57]  Anna Formica,et al.  Semantic Web search based on rough sets and Fuzzy Formal Concept Analysis , 2012, Knowl. Based Syst..

[58]  Richard D Kolodner,et al.  An overview of Cdk1-controlled targets and processes , 2010, Cell Division.

[59]  Jung Hun Ohn,et al.  Yin and Yang of disease genes and death genes between reciprocally scale-free biological networks , 2013, Nucleic acids research.

[60]  Curt Wittenberg,et al.  Cell cycle-dependent transcription in yeast: promoters, transcription factors, and transcriptomes , 2005, Oncogene.

[61]  Rui Henriques,et al.  BicPAM: Pattern-based biclustering for biomedical data analysis , 2014, Algorithms for Molecular Biology.