Negative correlations in collaboration: concepts and algorithms

This paper studies efficient mining of negative correlations that pace in collaboration. A collaborating negative correlation is a negative correlation between two sets of variables rather than traditionally between a pair of variables. It signifies a synchronized value rise or fall of all variables within one set whenever all variables in the other set go jointly at the opposite trend. The time complexity is exponential in mining. The high efficiency of our algorithm is attributed to two factors: (i) the transformation of the original data into a bipartite graph database, and (ii) the mining of transpose closures from a wide transactional database. Applying to a Yeast gene expression data, we evaluate, by using Pearson's correlation coefficient and P-value, the biological relevance of collaborating negative correlations as an example among many real-life domains.

[1]  W. Huh,et al.  High-resolution analysis of condition-specific regulatory modules in Saccharomyces cerevisiae , 2008, Genome Biology.

[2]  Jan Van den Bussche,et al.  Finding Clusters of Positive and Negative Coregulated Genes in Gene Expression Data , 2007, 2007 IEEE 7th International Symposium on BioInformatics and BioEngineering.

[3]  Jesús S. Aguilar-Ruiz,et al.  Shifting and scaling patterns from gene expression data , 2005, Bioinform..

[4]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[5]  Gösta Grahne,et al.  Fast algorithms for frequent itemset mining using FP-trees , 2005, IEEE Transactions on Knowledge and Data Engineering.

[6]  Frank Beier,et al.  Microarray analyses of gene expression during chondrocyte differentiation identifies novel regulators of hypertrophy. , 2005, Molecular biology of the cell.

[7]  Nicolas Pasquier,et al.  Discovering Frequent Closed Itemsets for Association Rules , 1999, ICDT.

[8]  Arlindo L. Oliveira,et al.  A polynomial time biclustering algorithm for finding approximate expression patterns in gene expression time series , 2009, Algorithms for Molecular Biology.

[9]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[10]  C. Missero,et al.  Multiple Ras Downstream Pathways Mediate Functional Repression of the Homeobox Gene Product TTF-1 , 2000, Molecular and Cellular Biology.

[11]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[12]  David Botstein,et al.  SGD: Saccharomyces Genome Database , 1998, Nucleic Acids Res..

[13]  Ronald W. Davis,et al.  A genome-wide transcriptional analysis of the mitotic cell cycle. , 1998, Molecular cell.

[14]  J. Bouchara,et al.  A Nonsense Mutation in the ERG6 Gene Leads to Reduced Susceptibility to Polyenes in a Clinical Isolate of Candida glabrata , 2008, Antimicrobial Agents and Chemotherapy.

[15]  Jean-Michel Camadro,et al.  Zinc suppresses the iron-accumulation phenotype of Saccharomyces cerevisiae lacking the yeast frataxin homologue (Yfh1). , 2003, The Biochemical journal.

[16]  D. Winge,et al.  Repression of Sulfate Assimilation Is an Adaptive Response of Yeast to the Oxidative Stress of Zinc Deficiency* , 2009, The Journal of Biological Chemistry.

[17]  Gösta Grahne,et al.  Efficiently Using Prefix-trees in Mining Frequent Itemsets , 2003, FIMI.

[18]  Anthony K. H. Tung,et al.  Mining Shifting-and-Scaling Co-Regulation Patterns on Gene Expression Profiles , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[19]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[20]  Dov J. Stekel,et al.  Strong negative self regulation of Prokaryotic transcription factors increases the intrinsic noise of protein expression , 2008, BMC Systems Biology.

[21]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[22]  Kian-Lee Tan,et al.  Mining gene expression data for positive and negative co-regulated gene clusters , 2004, Bioinform..

[23]  Steven Skiena,et al.  Implementing discrete mathematics - combinatorics and graph theory with Mathematica , 1990 .

[24]  Huan Liu,et al.  Subspace clustering for high dimensional data: a review , 2004, SKDD.

[25]  Grace S. Shieh,et al.  A pattern recognition approach to infer time-lagged genetic interactions , 2008, Bioinform..

[26]  Stefan R. Henz,et al.  A gene expression map of Arabidopsis thaliana development , 2005, Nature Genetics.

[27]  Yutaka Matsuo,et al.  Community gravity: measuring bidirectional effects by trust and rating on online social networks , 2009, WWW '09.

[28]  Jinyan Li,et al.  Maximization of negative correlations in time-course gene expression data for enhancing understanding of molecular pathways , 2009, Nucleic acids research.

[29]  Ulf Stahl,et al.  Combined overexpression of genes of the ergosterol biosynthetic pathway leads to accumulation of sterols in Saccharomyces cerevisiae. , 2003, FEMS yeast research.