Graph ranking for exploratory gene data analysis

BackgroundMicroarray technology has made it possible to simultaneously monitor the expression levels of thousands of genes in a single experiment. However, the large number of genes greatly increases the challenges of analyzing, comprehending and interpreting the resulting mass of data. Selecting a subset of important genes is inevitable to address the challenge. Gene selection has been investigated extensively over the last decade. Most selection procedures, however, are not sufficient for accurate inference of underlying biology, because biological significance does not necessarily have to be statistically significant. Additional biological knowledge needs to be integrated into the gene selection procedure.ResultsWe propose a general framework for gene ranking. We construct a bipartite graph from the Gene Ontology (GO) and gene expression data. The graph describes the relationship between genes and their associated molecular functions. Under a species condition, edge weights of the graph are assigned to be gene expression level. Such a graph provides a mathematical means to represent both species-independent and species-dependent biological information. We also develop a new ranking algorithm to analyze the weighted graph via a kernelized spatial depth (KSD) approach. Consequently, the importance of gene and molecular function can be simultaneously ranked by a real-valued measure, KSD, which incorporates the global and local structure of the graph. Over-expressed and under-regulated genes also can be separately ranked.ConclusionThe gene-function bigraph integrates molecular function annotations into gene expression data. The relevance of genes is described in the graph (through a common function). The proposed method provides an exploratory framework for gene data analysis.

[1]  Bing Zhang,et al.  GOTree Machine (GOTM): a web-based platform for interpreting sets of interesting genes using Gene Ontology hierarchies , 2004, BMC Bioinformatics.

[2]  D Haussler,et al.  Knowledge-based analysis of microarray gene expression data by using support vector machines. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[3]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[4]  Cesare Furlanello,et al.  Entropy-based gene ranking without selection bias for the predictive classification of microarray data , 2003, BMC Bioinformatics.

[5]  Christina Kendziorski,et al.  On Differential Variability of Expression Ratios: Improving Statistical Inference about Gene Expression Changes from Microarray Data , 2001, J. Comput. Biol..

[6]  Mei-Ling Ting Lee,et al.  Analysis of Microarray Gene Expression Data , 2004, Springer US.

[7]  Li Wang,et al.  CGI: a new approach for prioritizing genes by combining gene expression and protein-protein interaction data , 2007, Bioinform..

[8]  Desmond J. Higham,et al.  GeneRank: Using search engine technology for the analysis of microarray experiments , 2005, BMC Bioinformatics.

[9]  Yuanyuan Ding,et al.  Improving the Performance of SVM-RFE to Select Genes in Microarray Data , 2006, BMC Bioinformatics.

[10]  Gary A. Churchill,et al.  Analysis of Variance for Gene Expression Microarray Data , 2000, J. Comput. Biol..

[11]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[12]  Yixin Chen,et al.  Outlier Detection with the Kernelized Spatial Depth Function , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Jörg Rahnenführer,et al.  Robert Gentleman, Vincent Carey, Wolfgang Huber, Rafael Irizarry, Sandrine Dudoit (2005): Bioinformatics and Computational Biology Solutions Using R and Bioconductor , 2009 .

[14]  Marina Vannucci,et al.  Gene selection: a Bayesian variable selection approach , 2003, Bioinform..

[15]  John D. Lafferty,et al.  Diffusion Kernels on Graphs and Other Discrete Input Spaces , 2002, ICML.

[16]  Robert Gentleman,et al.  Using GOstats to test gene lists for GO term association , 2007, Bioinform..

[17]  Ramón Díaz-Uriarte,et al.  Gene selection and classification of microarray data using random forest , 2006, BMC Bioinformatics.

[18]  Tong Zhang,et al.  Learning on Graph with Laplacian Regularization , 2006, NIPS.

[19]  Thomas Lengauer,et al.  Improved scoring of functional groups from gene expression data by decorrelating GO graph structure , 2006, Bioinform..

[20]  Jiri Aubrecht,et al.  Differentiating mechanisms of toxicity using global gene expression analysis in Saccharomyces cerevisiae. , 2005, Mutation research.

[21]  Inderjit S. Dhillon,et al.  Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[22]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Soumen Chakrabarti,et al.  Learning random walks to rank nodes in graphs , 2007, ICML '07.

[24]  R. Serfling,et al.  Influence functions of some depth functions, and application to depth-weighted L-statistics , 2009 .

[25]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[26]  Y. Dodge on Statistical data analysis based on the L1-norm and related methods , 1987 .

[27]  Yuanyuan Ding,et al.  Robust clustering in high dimensional data using statistical depths , 2007, BMC Bioinformatics.

[28]  Robert Tibshirani,et al.  Statistical Significance for Genome-Wide Experiments , 2003 .

[29]  Fan Chung,et al.  Spectral Graph Theory , 1996 .

[30]  Rong Jin,et al.  A Novel Method Incorporating Gene Ontology Information for Unsupervised Clustering and Feature Selection , 2008, PloS one.

[31]  Martin Vingron,et al.  An Improved Statistic for Detecting Over-Represented Gene Ontology Annotations in Gene Sets , 2006, RECOMB.

[32]  U. Feige,et al.  Spectral Graph Theory , 2015 .

[33]  Chris H. Q. Ding,et al.  Bipartite graph partitioning and data clustering , 2001, CIKM '01.

[34]  Li Wang,et al.  Hybrid huberized support vector machines for microarray classification and gene selection , 2008, Bioinform..

[35]  Roded Sharan,et al.  Discovering statistically significant biclusters in gene expression data , 2002, ISMB.

[36]  M. Schummer,et al.  Selecting Differentially Expressed Genes from Microarray Experiments , 2003, Biometrics.

[37]  Nada Lavrac,et al.  SEGS: Search for enriched gene sets in microarray data , 2008, J. Biomed. Informatics.

[38]  Stephen J. Roberts,et al.  A theoretical analysis of gene selection , 2004, Proceedings. 2004 IEEE Computational Systems Bioinformatics Conference, 2004. CSB 2004..

[39]  R. Serfling A Depth Function and a Scale Curve Based on Spatial Quantiles , 2002 .

[40]  David Martin,et al.  GOToolBox: functional analysis of gene datasets based on Gene Ontology , 2004, Genome Biology.

[41]  Purvesh Khatri,et al.  Ontological analysis of gene expression data: current tools, limitations, and open problems , 2005, Bioinform..

[42]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[43]  Paul Van Dooren,et al.  On the pseudo-inverse of the Laplacian of a bipartite graph , 2005, Appl. Math. Lett..

[44]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[45]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[46]  Russ B. Altman,et al.  M-BISON: Microarray-based integration of data sources using networks , 2008, BMC Bioinformatics.