Improved prediction of protein interaction from microarray data using asymmetric correlation

Abstract Background Detection of correlated gene expression is a fundamental process in the characterization of gene functions using microarray data. Commonly used methods such as the Pearson correlation can detect only a fraction of interactions between genes or their products. However, the performance of correlation analysis can be significantly improved either by providing additional biological information or by combining correlation with other techniques that can extract various mathematical or statistical properties of gene expression from microarray data. In this article, I will test the performance of three correlation methods-the Pearson correlation, the rank (Spearman) correlation, and the Mutual Information approach-in detection of protein-protein interactions, and I will further examine the properties of these techniques when they are used together. I will also develop a new correlation measure which can be used with other measures to improve predictive power. Results Using data from 5,896 microarray hybridizations, the three measures were obtained for 30,499 known protein-interacting pairs in the Human Protein Reference Database (HPRD). Pearson correlation showed the best sensitivity (0.305) but the three measures showed similar specificity (0.240 - 0.257). When the three measures were compared, it was found that better specificity could be obtained at a high Pearson coefficient combined with a low Spearman coefficient or Mutual Information. Using a toy model of two gene interactions, I found that such measure combinations were most likely to exist at stronger curvature. I therefore introduced a new measure, termed asymmetric correlation (AC), which directly quantifies the degree of curvature in the expression levels of two genes as a degree of asymmetry. I found that AC performed better than the other measures, particularly when high specificity was required. Moreover, a combination of AC with other measures significantly improved specificity and sensitivity, by up to 50%. Conclusions A combination of correlation measures, particularly AC and Pearson correlation, can improve prediction of protein-protein interactions. Further studies are required to assess the biological significance of asymmetry in expression patterns of gene pairs.

[1]  T. Poggio,et al.  Multiclass cancer diagnosis using tumor gene expression signatures , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[2]  A. E. Hirsh,et al.  Coevolution of gene expression among interacting proteins , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Sandhya Rani,et al.  Human Protein Reference Database—2009 update , 2008, Nucleic Acids Res..

[4]  R. Moddemeijer On estimation of entropy and mutual information of continuous distributions , 1989 .

[5]  E. Davidson,et al.  Modeling transcriptional regulatory networks. , 2002, BioEssays : news and reviews in molecular, cellular and developmental biology.

[6]  Arndt Benecke,et al.  Genomic Plasticity and Information Processing by Transcription Coregulators , 2003, Complexus.

[7]  John Quackenbush,et al.  Seeded Bayesian Networks: Constructing genetic networks from microarray data , 2008, BMC Systems Biology.

[8]  R. Veitia,et al.  A sigmoidal transcriptional response: cooperativity, synergy and dosage effects , 2003, Biological reviews of the Cambridge Philosophical Society.

[9]  Yuan Ji,et al.  Extracting three-way gene interactions from microarray data , 2007, Bioinform..

[10]  Marco Grzegorczyk,et al.  Comparative evaluation of reverse engineering gene regulatory networks with relevance networks, graphical gaussian models and bayesian networks , 2006, Bioinform..

[11]  K. Zou,et al.  Correlation and simple linear regression. , 2003, Radiology.

[12]  Ibrahim Emam,et al.  ArrayExpress update—from an archive of functional genomics experiments to the atlas of gene expression , 2008, Nucleic Acids Res..

[13]  Claudio Altafini,et al.  Discerning static and causal interactions in genome-wide reverse engineering problems , 2008, Bioinform..

[14]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[15]  Chaoyang Zhang,et al.  Parallelization of multicategory support vector machines (PMC-SVM) for classifying microarray data , 2006, BMC Bioinformatics.

[16]  Utility of correlation measures in analysis of gene expression , 2006 .

[17]  Burkhard Rost,et al.  Physical protein–protein interactions predicted from microarrays , 2008, Bioinform..

[18]  Peter J. Woolf,et al.  Learning transcriptional regulatory networks from high throughput gene expression data using continuous three-way mutual information , 2008, BMC Bioinformatics.

[19]  Andreas D. Baxevanis,et al.  Bioinformatics - a practical guide to the analysis of genes and proteins , 2001, Methods of biochemical analysis.

[20]  A. Baxevanis,et al.  A Practical Guide to the Analysis of Genes and Proteins , 1998 .

[21]  Russ B. Altman,et al.  Nonparametric methods for identifying differentially expressed genes in microarray data , 2002, Bioinform..

[22]  Bill Shipley,et al.  Cause and Correlation in Biology: A User''s Guide to Path Analysis , 2016 .

[23]  Naama Barkai,et al.  Computational verification of protein-protein interactions by orthologous co-expression , 2005, BMC Bioinformatics.

[24]  M. Gerstein,et al.  Relating whole-genome expression data with protein-protein interactions. , 2002, Genome research.

[25]  Nir Friedman,et al.  Inferring subnetworks from perturbed expression profiles , 2001, ISMB.

[26]  A. Dupuy,et al.  Critical review of published microarray studies for cancer outcome and guidelines on statistical analysis and reporting. , 2007, Journal of the National Cancer Institute.