Co-expressed Pathways DataBase for Tomato: a database to predict pathways relevant to a query gene

BackgroundGene co-expression, the similarity of gene expression profiles under various experimental conditions, has been used as an indicator of functional relationships between genes, and many co-expression databases have been developed for predicting gene functions. These databases usually provide users with a co-expression network and a list of strongly co-expressed genes for a query gene. Several of these databases also provide functional information on a set of strongly co-expressed genes (i.e., provide biological processes and pathways that are enriched in these strongly co-expressed genes), which is generally analyzed via over-representation analysis (ORA). A limitation of this approach may be that users can predict gene functions only based on the strongly co-expressed genes.ResultsIn this study, we developed a new co-expression database that enables users to predict the function of tomato genes from the results of functional enrichment analyses of co-expressed genes while considering the genes that are not strongly co-expressed. To achieve this, we used the ORA approach with several thresholds to select co-expressed genes, and performed gene set enrichment analysis (GSEA) applied to a ranked list of genes ordered by the co-expression degree. We found that internal correlation in pathways affected the significance levels of the enrichment analyses. Therefore, we introduced a new measure for evaluating the relationship between the gene and pathway, termed the percentile (p)-score, which enables users to predict functionally relevant pathways without being affected by the internal correlation in pathways. In addition, we evaluated our approaches using receiver operating characteristic curves, which concluded that the p-score could improve the performance of the ORA.ConclusionsWe developed a new database, named Co-expressed Pathways DataBase for Tomato, which is available at http://cox-path-db.kazusa.or.jp/tomato. The database allows users to predict pathways that are relevant to a query gene, which would help to infer gene functions.

[1]  Hyojin Kim,et al.  AraNet v2: an improved database of co-functional gene networks for the study of Arabidopsis thaliana and 27 other nonmodel plant species , 2014, Nucleic Acids Res..

[2]  Evan Bolton,et al.  Database resources of the National Center for Biotechnology Information , 2017, Nucleic Acids Res..

[3]  Minoru Kanehisa,et al.  KEGG as a reference resource for gene and protein annotation , 2015, Nucleic Acids Res..

[4]  C. Ford,et al.  VTCdb: a gene co-expression database for the crop species Vitis vinifera (grapevine) , 2013, BMC Genomics.

[5]  Seon-Young Kim,et al.  Gene-set approach for expression pattern analysis , 2008, Briefings Bioinform..

[6]  B. Usadel,et al.  PlaNet: Combined Sequence and Expression Comparisons across Plant Networks Derived from Seven Species[W][OA] , 2011, Plant Cell.

[7]  Kengo Kinoshita,et al.  ATTED-II in 2016: A Plant Coexpression Database Towards Lineage-Specific Coexpression , 2015, Plant & cell physiology.

[8]  M. Hirai,et al.  A Chloroplastic UDP-Glucose Pyrophosphorylase from Arabidopsis Is the Committed Enzyme for the First Step of Sulfolipid Biosynthesis[W][OA] , 2009, The Plant Cell Online.

[9]  Wen J. Li,et al.  Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation , 2015, Nucleic Acids Res..

[10]  Julie A. Dickerson,et al.  Arabidopsis gene co-expression network and its functional modules , 2009, BMC Bioinformatics.

[11]  Wolfgang Schramm,et al.  Team , 2018, Spaces of Intensity.

[12]  Hajime Ohyanagi,et al.  Plant Omics Data Center: An Integrated Web Repository for Interspecies Gene Expression Networks with NLP-Based Curation , 2014, Plant & cell physiology.

[13]  Christian G Elowsky,et al.  The Origin and Biosynthesis of the Benzenoid Moiety of Ubiquinone (Coenzyme Q) in Arabidopsis[W] , 2014, Plant Cell.

[14]  Daniel W. A. Buchan,et al.  The tomato genome sequence provides insights into fleshy fruit evolution , 2012, Nature.

[15]  Christina Backes,et al.  Computation of significance scores of unweighted Gene Set Enrichment Analyses , 2007, BMC Bioinformatics.

[16]  Paul Kersey,et al.  Ensembl Plants: Integrating Tools for Visualizing, Mining, and Analyzing Plant Genomics Data. , 2016, Methods in molecular biology.

[17]  Yi Zheng,et al.  Tomato Functional Genomics Database: a comprehensive resource and analysis package for tomato functional genomics , 2010, Nucleic Acids Res..

[18]  Hyojin Kim,et al.  RiceNet v2: an improved network prioritization server for rice genes , 2015, Nucleic Acids Res..

[19]  E. Fantini,et al.  Dissection of Tomato Lycopene Biosynthesis through Virus-Induced Gene Silencing1[C][W][OPEN] , 2013, Plant Physiology.

[20]  Björn Usadel,et al.  Trimmomatic: a flexible trimmer for Illumina sequence data , 2014, Bioinform..

[21]  Hideaki Sugawara,et al.  The Sequence Read Archive , 2010, Nucleic Acids Res..

[22]  Hideyuki Suzuki,et al.  CoP: a database for characterizing co-expressed gene modules with biological information in plants , 2010, Bioinform..

[23]  Takashi Gojobori,et al.  The DNA Data Bank of Japan launches a new resource, the DDBJ Omics Archive of functional genomics experiments , 2011, Nucleic Acids Res..

[24]  J. Rose,et al.  Regulation of ripening and opportunities for control in tomato and other fruits. , 2013, Plant biotechnology journal.

[25]  L. M. Sandalio,et al.  Peroxisomes from pepper fruits (Capsicum annuum L.): purification, characterisation and antioxidant activity. , 2003, Journal of plant physiology.

[26]  Staffan Persson,et al.  Co-expression tools for plant biology: opportunities for hypothesis generation and caveats. , 2009, Plant, cell & environment.

[27]  Atul J. Butte,et al.  Ten Years of Pathway Analysis: Current Approaches and Outstanding Challenges , 2012, PLoS Comput. Biol..

[28]  K. Okawa,et al.  Stomagen positively regulates stomatal density in Arabidopsis , 2010, Nature.

[29]  Rafael A Irizarry,et al.  Gene set enrichment analysis made simple , 2009, Statistical methods in medical research.

[30]  S. Agarwal,et al.  Role of Antioxidant Lycopene in Cancer and Heart Disease , 2000, Journal of the American College of Nutrition.

[31]  K. Kinoshita,et al.  ALCOdb: Gene Coexpression Database for Microalgae , 2015, Plant & cell physiology.

[32]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.

[33]  M. Hirai,et al.  Omics-based identification of Arabidopsis Myb transcription factors regulating aliphatic glucosinolate biosynthesis , 2007, Proceedings of the National Academy of Sciences.

[34]  Lior Pachter,et al.  Near-optimal probabilistic RNA-seq quantification , 2016, Nature Biotechnology.