EPSILON: an eQTL prioritization framework using similarity measures derived from local networks

MOTIVATION When genomic data are associated with gene expression data, the resulting expression quantitative trait loci (eQTL) will likely span multiple genes. eQTL prioritization techniques can be used to select the most likely causal gene affecting the expression of a target gene from a list of candidates. As an input, these techniques use physical interaction networks that often contain highly connected genes and unreliable or irrelevant interactions that can interfere with the prioritization process. We present EPSILON, an extendable framework for eQTL prioritization, which mitigates the effect of highly connected genes and unreliable interactions by constructing a local network before a network-based similarity measure is applied to select the true causal gene. RESULTS We tested the new method on three eQTL datasets derived from yeast data using three different association techniques. A physical interaction network was constructed, and each eQTL in each dataset was prioritized using the EPSILON approach: first, a local network was constructed using a k-trials shortest path algorithm, followed by the calculation of a network-based similarity measure. Three similarity measures were evaluated: random walks, the Laplacian Exponential Diffusion kernel and the Regularized Commute-Time kernel. The aim was to predict knockout interactions from a yeast knockout compendium. EPSILON outperformed two reference prioritization methods, random assignment and shortest path prioritization. Next, we found that using a local network significantly increased prioritization performance in terms of predicted knockout pairs when compared with using exactly the same network similarity measures on the global network, with an average increase in prioritization performance of 8 percentage points (P < 10(-5)). AVAILABILITY The physical interaction network and the source code (Matlab/C++) of our implementation can be downloaded from http://bioinformatics.intec.ugent.be/epsilon. CONTACT lieven.verbeke@intec.ugent.be, kamar@psb.ugent.be, jan.fostier@intec.ugent.be SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  M. Daly,et al.  Guilt by association , 2000, Nature Genetics.

[2]  Tommi S. Jaakkola,et al.  Physical Network Models , 2004, J. Comput. Biol..

[3]  K. Schughart,et al.  Data-driven assessment of eQTL mapping methods , 2010, BMC Genomics.

[4]  J. Bader,et al.  Finding friends and enemies in an enemies-only network: a graph diffusion kernel for predicting novel genetic interactions and co-complex membership from yeast genetic interactions. , 2008, Genome research.

[5]  Trey Ideker,et al.  Integrated Assessment and Prediction of Transcription Factor Binding , 2006, PLoS Comput. Biol..

[6]  Dianne P. O'Leary,et al.  Why Do Hubs in the Yeast Protein Interaction Network Tend To Be Essential: Reexamining the Connection between the Network Topology and Essentiality , 2008, PLoS Comput. Biol..

[7]  Li Wang,et al.  An integrative approach for causal gene identification and gene regulatory pathway inference , 2006, ISMB.

[8]  Chun Jimmie Ye,et al.  Accurate Discovery of Expression Quantitative Trait Loci Under Confounding From Spurious and Genuine Regulatory Hotspots , 2008, Genetics.

[9]  Bart De Moor,et al.  Candidate gene prioritization by network analysis of differential expression using machine learning approaches , 2010, BMC Bioinformatics.

[10]  Srinivasan Parthasarathy,et al.  A single source k-shortest paths algorithm to infer regulatory pathways in a gene network , 2012, Bioinform..

[11]  Subhash Suri,et al.  Finding the k shortest simple paths , 2007, ALENEX.

[12]  Shang-Hua Teng,et al.  Spectral affinity in protein networks , 2009, BMC Systems Biology.

[13]  Rachel B. Brem,et al.  The landscape of genetic complexity across 5,700 gene expression traits in yeast. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[14]  M. Gerstein,et al.  Global analysis of protein phosphorylation in yeast , 2005, Nature.

[15]  Roded Sharan,et al.  SPINE: a framework for signaling-regulatory pathway inference from cause-effect experiments , 2007, ISMB/ECCB.

[16]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[17]  Bart De Moor,et al.  A guide to web tools to prioritize candidate genes , 2011, Briefings Bioinform..

[18]  Yudong D. He,et al.  Functional Discovery via a Compendium of Expression Profiles , 2000, Cell.

[19]  A. Beyer,et al.  Detection and interpretation of expression quantitative trait loci (eQTL). , 2009, Methods.

[20]  Aleksandar Stojmirovic,et al.  Information Flow in Interaction Networks II: Channels, Path Lengths, and Potentials , 2012, J. Comput. Biol..

[21]  Wei Pan,et al.  Multilocus association testing with penalized regression , 2011, Genetic epidemiology.

[22]  David Heckerman,et al.  Correction for hidden confounders in the genetic analysis of gene expression , 2010, Proceedings of the National Academy of Sciences.

[23]  Aleksandar Stojmirovic,et al.  ITM Probe: analyzing information flow in protein networks , 2009, Bioinform..

[24]  Ron Shamir,et al.  Network-induced Classification Kernels for Gene Expression Profile Analysis , 2012 .

[25]  Trevor J. Hastie,et al.  Genome-wide association analysis by lasso penalized logistic regression , 2009, Bioinform..

[26]  Jesse Gillis,et al.  The Impact of Multifunctional Genes on "Guilt by Association" Analysis , 2011, PloS one.

[27]  Yonina C. Eldar,et al.  eQED: an efficient method for interpreting eQTL associations using protein networks , 2008, Molecular systems biology.

[28]  Kara Dolinski,et al.  The BioGRID Interaction Database: 2011 update , 2010, Nucleic Acids Res..

[29]  Kevin Françoisse,et al.  The Sum-over-Paths Covariance Kernel: A Novel Covariance Measure between Nodes of a Directed Graph , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  L. Kruglyak,et al.  Genetic Dissection of Transcriptional Regulation in Budding Yeast , 2002, Science.

[31]  T. Ideker,et al.  Comprehensive curation and analysis of global interaction networks in Saccharomyces cerevisiae , 2006, Journal of biology.

[32]  P. Robinson,et al.  Walking the interactome for prioritization of candidate disease genes. , 2008, American journal of human genetics.

[33]  Albert-László Barabási,et al.  Genetic Dissection of Transcriptional Regulation in Budding Yeast , 2002 .

[34]  François Fouss,et al.  An Experimental Investigation of Graph Kernels on a Collaborative Recommendation Task , 2006, Sixth International Conference on Data Mining (ICDM'06).