Effective similarity measures for expression profiles

It is commonly accepted that genes with similar expression profiles are functionally related. However, there are many ways one can measure the similarity of expression profiles, and it is not clear a priori what is the most effective one. Moreover, so far no clear distinction has been made as for the type of the functional link between genes as suggested by microarray data. Similarly expressed genes can be part of the same complex as interacting partners; they can participate in the same pathway without interacting directly; they can perform similar functions; or they can simply have similar regulatory sequences. Here we conduct a study of the notion of functional link as implied from expression data. We analyze different similarity measures of gene expression profiles and assess their usefulness and robustness in detecting biological relationships by comparing the similarity scores with results obtained from databases of interacting proteins, promoter signals and cellular pathways, as well as through sequence comparisons. We also introduce variations on similarity measures that are based on statistical analysis and better discriminate genes which are functionally nearby and faraway. Our tools can be used to assess other similarity measures for expression profiles, and are accessible at biozon.org/tools/expression/

[1]  D. E. Roberts,et al.  The Upper Tail Probabilities of Spearman's Rho , 1975 .

[2]  E. J. Gumbel,et al.  Statistics of Extremes. , 1960 .

[3]  Ioannis Xenarios,et al.  DIP: The Database of Interacting Proteins: 2001 update , 2001, Nucleic Acids Res..

[4]  M. Gerstein,et al.  Beyond synexpression relationships: local clustering of time-shifted and inverted gene expression profiles identifies new, biologically relevant interactions. , 2001, Journal of molecular biology.

[5]  S. Dudoit,et al.  STATISTICAL METHODS FOR IDENTIFYING DIFFERENTIALLY EXPRESSED GENES IN REPLICATED cDNA MICROARRAY EXPERIMENTS , 2002 .

[6]  Jeffrey A. Johnson,et al.  Microarray expression analysis of effects of exercise training: increase in atrial MLC-1 in rat ventricles. , 2003, American journal of physiology. Heart and circulatory physiology.

[7]  Larry V McIntire,et al.  Microarray analysis of shear stressed endothelial cells. , 2003, Biorheology.

[8]  J. Fostel,et al.  Genome-Wide Expression Patterns inSaccharomyces cerevisiae: Comparison of Drug Treatments and Genetic Alterations Affecting Biosynthesis of Ergosterol , 2000, Antimicrobial Agents and Chemotherapy.

[9]  I. P. López,et al.  DNA microarray analysis of genes differentially expressed in diet-induced (cafeteria) obese rats. , 2003, Obesity research.

[10]  T. Yeatman,et al.  The Future of Clinical Cancer Management: One Tumor, One Chip , 2003, The American surgeon.

[11]  Golan Yona,et al.  BIOZON: a hub of heterogeneous biological data , 2006, Nucleic Acids Res..

[12]  Ronald W. Davis,et al.  A genome-wide transcriptional analysis of the mitotic cell cycle. , 1998, Molecular cell.

[13]  Minoru Kanehisa,et al.  Toward Pathway Engineering: A New Database of Genetic and Molecular Pathways , 1997 .

[14]  Golan Yona,et al.  Automation of gene assignments to metabolic pathways using high-throughput expression data , 2005, BMC Bioinformatics.

[15]  J. Szentágothai,et al.  Brain Research , 2009, Experimental Neurology.

[16]  David Botstein,et al.  Disruption of yeast forkhead-associated cell cycle transcription by oxidative stress. , 2004, Molecular biology of the cell.

[17]  Laurie J. Heyer,et al.  Exploring expression data: identification and analysis of coexpressed genes. , 1999, Genome research.

[18]  D. Botstein,et al.  Genomic expression programs in the response of yeast cells to environmental changes. , 2000, Molecular biology of the cell.

[19]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[20]  Michael Q. Zhang,et al.  SCPD: a promoter database of the yeast Saccharomyces cerevisiae , 1999, Bioinform..

[21]  N Linial,et al.  ProtoMap: Automatic classification of protein sequences, a hierarchy of protein families, and local maps of the protein space , 1999, Proteins.

[22]  Adam J. Smith,et al.  The Database of Interacting Proteins: 2004 update , 2004, Nucleic Acids Res..

[23]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[24]  Amir Dembo,et al.  Strong limit theorems of empirical functionals for large exceedances of partial sums of i , 1991 .

[25]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[26]  S. Gygi,et al.  Correlation between Protein and mRNA Abundance in Yeast , 1999, Molecular and Cellular Biology.

[27]  M. Holcombe,et al.  Information Processing in Cells and Tissues , 1998, Springer US.

[28]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[29]  Edison T Liu,et al.  Classification of cancers by expression profiling. , 2003, Current opinion in genetics & development.

[30]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[31]  Yudong D. He,et al.  Functional Discovery via a Compendium of Expression Profiles , 2000, Cell.

[32]  H. Chun,et al.  Oxidative stress regulated genes in nigral dopaminergic neuronal cells: correlation with the known pathology in Parkinson's disease. , 2003, Brain research. Molecular brain research.

[33]  G. Rubin,et al.  The Role of the Genome Project in Determining Gene Function: Insights from Model Organisms , 1996, Cell.

[34]  Gary D Bader,et al.  BIND--The Biomolecular Interaction Network Database. , 2001, Nucleic acids research.