Development of Animal QTLdb and CorrDB: Resynthesizing Big Data to Improve Meta-analysis of Genetic and Genomic Information

In the age of big data, biological databases must undergo rapid development of their data infrastructure in order to effectively accommodate abundant data in a structured manner to improve metadata analysis. The livestock genetic and phenotypic correlation data from studies carried out in the past 70+ years, and the quantitative trait loci (QTL) mapping results from studies over the past 25+ years, provide a huge amount of information to add new types of annotations to animal genomes. The growth of Animal QTLdb and CorrDB over the past decade provides valuable tools for researchers to utilize a wealth of historical and future phenotype/genotype data to elucidate the genetic mechanisms behind livestock production improvements. Our recent efforts in extensive data curation, data quality maintenance, new web tool developments, and collaborative database expansions provide convenient platforms for data queries and analysis to serve the phenotype/genotype data collection needs of the livestock genetics/genomics community. Through the course of over 13 (QTLdb) and 5 (CorrDB) years of development, applications developed for Animal QTLdb and CorrDB have embraced the big data era when metadata analysis started to demonstrate its power and utility in terms of resynthesis of metadata for improved genetic analysis. To date, there have been 136,137 QTL/association data curated from 1,881 journal articles that represent 1,890 different traits in 6 livestock animal species. We use a strategy to map all QTL/correlation trait data to ontology terms so that they can be linked by underlying information networks. By developing trait-centric and gene-centric views of the QTL/association data, vast amounts of phenotype/genotype data can now be summarized in helpful new ways. In addition, we continue to expand the types of data collected for inclusion. The most recent addition is to include “supplementary data,” e.g., original genotypes, phenotypes, near-significant or other association/QTL data from the same experiment that may not be part of official publications. The inclusion of such data may add value to the “big data” pool when meta-analyses are conducted. The most critical developmental work, although not obvious to the public, is the improvement of curation tools and workflow to improve data quality control and maintenance. For example, we have added several new data status types, as well as corresponding procedures to better manage data flow within the curator/editor pipelines. The goals of our ongoing database development are not only to facilitate data collection, curation, and annotation, but also to provide mechanisms to support new types of data reanalysis, combined analysis, and data mining that may lead to new discoveries.

[1]  Bronwen L. Aken,et al.  Analyses of pig genomes provide insight into porcine demography and evolution , 2012, Nature.

[2]  James M Reecy,et al.  A public platform for QTL comparisons and integration with diverse types of structural genomic information , 2007 .

[3]  Colin N. Dewey,et al.  Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution , 2004, Nature.

[4]  K. Worley,et al.  The Genome Sequence of Taurine Cattle: A Window to Ruminant Biology and Evolution , 2009, Science.

[5]  D. Chalopin,et al.  The rainbow trout genome provides novel insights into evolution after whole-genome duplication in vertebrates , 2014, Nature Communications.

[6]  Daniel Gianola,et al.  Meta-Analysis of Quantitative Trait Association and Mapping Studies using Parametric and Non-Parametric Models , 2013 .

[7]  Janan T. Eppig,et al.  The Vertebrate Trait Ontology: a controlled vocabulary for the annotation of trait data across species , 2013, J. Biomed. Semant..

[8]  Hans H. Cheng,et al.  Coordinated international action to accelerate genome-to-phenome with FAANG, the Functional Annotation of Animal Genomes project , 2015, Genome Biology.

[9]  C. Wade,et al.  The sheep genome reference sequence: a work in progress. , 2010, Animal genetics.

[10]  Bronwen L. Aken,et al.  The sheep genome illuminates biology of the rumen and lipid metabolism , 2014, Science.

[11]  S. Koren,et al.  The channel catfish genome sequence provides insights into the evolution of scale formation in teleosts , 2016, Nature Communications.

[12]  J. N. MacLeod,et al.  Genome Sequence, Comparative Analysis, and Population Genetics of the Domestic Horse , 2009, Science.

[13]  Pita Sudrajad,et al.  Stories and Challenges of Genome Wide Association Studies in Livestock — A Review , 2015, Asian-Australasian journal of animal sciences.