ATTED-II in 2018: A Plant Coexpression Database Based on Investigation of the Statistical Property of the Mutual Rank Index

Abstract ATTED-II (http://atted.jp) is a coexpression database for plant species to aid in the discovery of relationships of unknown genes within a species. As an advanced coexpression analysis method, multispecies comparisons have the potential to detect alterations in gene relationships within an evolutionary context. However, determining the validity of comparative coexpression studies is difficult without quantitative assessments of the quality of coexpression data. ATTED-II (version 9) provides 16 coexpression platforms for nine plant species, including seven species supported by both microarray- and RNA sequencing (RNAseq)-based coexpression data. Two independent sources of coexpression data enable the assessment of the reproducibility of coexpression. The latest coexpression data for Arabidopsis (Ath-m.c7-1 and Ath-r.c3-0) showed the highest reproducibility (Jaccard coefficient = 0.13) among previous coexpression data in ATTED-II. We also investigated the statistical basis of the mutual rank (MR) index as a coexpression measure by bootstrap sampling of experimental units. We found that the error distribution of the logit-transformed MR index showed normality with equal variances for each coexpression platform. Because the MR error was strongly correlated with the number of samples for the coexpression data, typical confidence intervals for the MR index can be estimated for any coexpression platform. These new, high-quality coexpression data can be analyzed with any tool in ATTED-II and combined with external resources to obtain insight into plant biology.

[1]  Staffan Persson,et al.  Co-expression tools for plant biology: opportunities for hypothesis generation and caveats. , 2009, Plant, cell & environment.

[2]  K. Vandepoele,et al.  Phylogenomic analysis of gene co‐expression networks reveals the evolution of functional modules , 2017, The Plant journal : for cell and molecular biology.

[3]  Kengo Kinoshita,et al.  Coexpression landscape in ATTED-II: usage of gene list and gene network for various types of pathways , 2010, Journal of Plant Research.

[4]  Tatiana A. Tatusova,et al.  Gene: a gene-centered information resource at NCBI , 2014, Nucleic Acids Res..

[5]  Kengo Kinoshita,et al.  ATTED-II: a database of co-expressed genes and cis elements for identifying co-regulated gene groups in Arabidopsis , 2006, Nucleic Acids Res..

[6]  Toshihisa Takagi,et al.  DDBJ new system and service refactoring , 2012, Nucleic Acids Res..

[7]  K. Kinoshita,et al.  Rank of Correlation Coefficient as a Comparable Measure for Biological Significance of Gene Coexpression , 2009, DNA research : an international journal for rapid publication of reports on genes and genomes.

[8]  Kengo Kinoshita,et al.  ATTED-II provides coexpressed gene networks for Arabidopsis , 2008, Nucleic Acids Res..

[9]  Catalin C. Barbacioru,et al.  Evaluation of DNA microarray results with quantitative gene expression platforms , 2006, Nature Biotechnology.

[10]  S. Horvath,et al.  A General Framework for Weighted Gene Co-Expression Network Analysis , 2005, Statistical applications in genetics and molecular biology.

[11]  D. McClish Analyzing a Portion of the ROC Curve , 1989, Medical decision making : an international journal of the Society for Medical Decision Making.

[12]  Crispin J. Miller,et al.  The removal of multiplicative, systematic bias allows integration of breast cancer gene expression datasets – improving meta-analysis and prediction of prognosis , 2008, BMC Medical Genomics.

[13]  Sara Ballouz,et al.  Guidance for RNA-seq co-expression network construction and analysis: safety in numbers , 2015, Bioinform..

[14]  E. Hovig,et al.  Methods that remove batch effects while retaining group differences may lead to exaggerated confidence in downstream analyses , 2015, Biostatistics.

[15]  Martijn A. Huynen,et al.  Conserved co-expression for candidate disease gene prioritization , 2008, BMC Bioinformatics.

[16]  Kengo Kinoshita,et al.  ATTED-II in 2014: Evaluation of Gene Coexpression in Agriculturally Important Plants , 2014, Plant & cell physiology.

[17]  Robert Petryszak,et al.  ArrayExpress update—simplifying data submissions , 2014, Nucleic Acids Res..

[18]  Yoshiyuki Ogata,et al.  Approaches for extracting practical information from gene co-expression networks in plant biology. , 2007, Plant & cell physiology.

[19]  K. Kinoshita,et al.  Comparison of Gene Coexpression Profiles and Construction of Conserved Gene Networks to Find Functional Modules , 2015, PloS one.

[20]  Carole A. Goble,et al.  Investigating Semantic Similarity Measures Across the Gene Ontology: The Relationship Between Sequence and Annotation , 2003, Bioinform..

[21]  Limsoon Wong,et al.  Why Batch Effects Matter in Omics Data, and How to Avoid Them. , 2017, Trends in biotechnology.

[22]  Cheng Li,et al.  Adjusting batch effects in microarray expression data using empirical Bayes methods. , 2007, Biostatistics.

[23]  David M. Simcha,et al.  Tackling the widespread and critical impact of batch effects in high-throughput data , 2010, Nature Reviews Genetics.

[24]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[25]  Minoru Kanehisa,et al.  KEGG as a reference resource for gene and protein annotation , 2015, Nucleic Acids Res..

[26]  Juancarlos Chan,et al.  Gene Ontology Consortium: going forward , 2014, Nucleic Acids Res..

[27]  Klaas Vandepoele,et al.  Comparative Network Analysis Reveals That Tissue Specificity and Gene Function Are Important Factors Influencing the Mode of Expression Evolution in Arabidopsis and Rice1[W] , 2011, Plant Physiology.

[28]  Kengo Kinoshita,et al.  ATTED-II Updates: Condition-Specific Gene Coexpression to Extend Coexpression Analyses and Applications to a Broad Range of Flowering Plants , 2011, Plant & cell physiology.

[29]  Kengo Kinoshita,et al.  COXPRESdb: a database to compare gene coexpression in seven model animals , 2010, Nucleic Acids Res..

[30]  K. Kinoshita,et al.  ALCOdb: Gene Coexpression Database for Microalgae , 2015, Plant & cell physiology.

[31]  Joshua M. Stuart,et al.  A Gene-Coexpression Network for Global Discovery of Conserved Genetic Modules , 2003, Science.

[32]  Shimon Sakaguchi,et al.  Immuno-Navigator, a batch-corrected coexpression database, reveals cell type-specific gene networks in the immune system , 2016, Proceedings of the National Academy of Sciences.

[33]  Koji Kadota,et al.  Ranking differentially expressed genes from Affymetrix gene expression data: methods with reproducibility, sensitivity, and specificity , 2008, Algorithms for Molecular Biology.

[34]  R. Fisher FREQUENCY DISTRIBUTION OF THE VALUES OF THE CORRELATION COEFFIENTS IN SAMPLES FROM AN INDEFINITELY LARGE POPU;ATION , 1915 .

[35]  Jonas Haslbeck Bootstrap Aggregating , 2009, Encyclopedia of Database Systems.

[36]  A. Brazma,et al.  Reuse of public genome-wide gene expression data , 2012, Nature Reviews Genetics.

[37]  Harald Binder,et al.  Removing Batch Effects from Longitudinal Gene Expression - Quantile Normalization Plus ComBat as Best Approach for Microarray Transcriptome Data , 2016, PloS one.

[38]  Kengo Kinoshita,et al.  ATTED-II in 2016: A Plant Coexpression Database Towards Lineage-Specific Coexpression , 2015, Plant & cell physiology.