Building a Research-Quality Copy Number Variation Data Repository for Translational Research

Copy number variation (CNV) has known associations with population diversities and disease conditions. However, research communities face great challenges in reusing the CNV data due to the heterogeneity of existing CNV data sources. The objective of the study is to design, develop and evaluate a scalable CNV data repository based on a proposed common data schema for facilitating research-quality CNV data integration and reuse. We created a proposal for a CNV common data schema through analyzing multiple existing CNV data sources. We designed a collection of the CNV quality metrics and demonstrated its usefulness using the CNV data from a study of ovarian cancer xenograft models. We implemented a CNV data repository using a MongoDB database backend and established the CNV genomic data services that enable reusing of the curated CNV data and answering CNV-relevant research questions. The critical issues and future plan for the system enhancement and community engagement were discussed.

[1]  Austin D. Swafford,et al.  Detection and correction of artefacts in estimation of rare copy number variants and analysis of rare deletions in type 1 diabetes , 2014, Human molecular genetics.

[2]  Sharon J. Diskin,et al.  Adjustment of genomic waves in signal intensities from whole-genome SNP genotyping platforms , 2008, Nucleic acids research.

[3]  Caroline F. Wright,et al.  DECIPHER: database for the interpretation of phenotype-linked plausibly pathogenic sequence and copy-number variation , 2013, Nucleic Acids Res..

[4]  M. Wigler,et al.  Circular binary segmentation for the analysis of array-based DNA copy number data. , 2004, Biostatistics.

[5]  Jared Evans,et al.  PatternCNV: a versatile tool for detecting copy number changes from exome sequencing data , 2014, Bioinform..

[6]  E. Thorland,et al.  Target-enrichment sequencing and copy number evaluation in inherited polyneuropathy , 2016, Neurology.

[7]  Tien-Hsiung Ku,et al.  Genetic copy number variants in myocardial infarction patients with hyperlipidemia , 2011, BMC Genomics.

[8]  Joshy George,et al.  Integrated Genome-Wide DNA Copy Number and Expression Analysis Identifies Distinct Mechanisms of Primary Chemoresistance in Ovarian Carcinomas , 2009, Clinical Cancer Research.

[9]  Johan Staaf,et al.  Normalization of Illumina Infinium whole-genome SNP data improves copy number estimates and allelic intensity ratios , 2008, BMC Bioinformatics.

[10]  Joseph T. Glessner,et al.  Large Copy-Number Variations Are Enriched in Cases With Moderate to Extreme Obesity , 2010, Diabetes.

[11]  S. Weroha,et al.  Conventional Chemotherapy and Oncogenic Pathway Targeting in Ovarian Carcinosarcoma Using a Patient-Derived Tumorgraft , 2015, PloS one.

[12]  J. Sarkaria,et al.  Tumorgrafts as In Vivo Surrogates for Women with Ovarian Cancer , 2014, Clinical Cancer Research.

[13]  Kerry R Emslie,et al.  Measurement of absolute copy number variation reveals association with essential hypertension , 2014, BMC Medical Genomics.

[14]  B. Stranger,et al.  The impact of human copy number variation on gene expression. , 2015, Briefings in functional genomics.

[15]  D. Visscher,et al.  Ovarian cancer tumorgraft: Viral latency propagates lymphoma , 2012 .

[16]  Joshy George,et al.  Whole–genome characterization of chemoresistant ovarian cancer , 2015, Nature.

[17]  Qingguo Wang,et al.  Computational tools for copy number variation (CNV) detection using next-generation sequencing data: features and perspectives , 2013, BMC Bioinformatics.

[18]  Joshua M. Korn,et al.  Integrated detection and population-genetic analysis of SNPs and copy number variation , 2008, Nature Genetics.

[19]  Daniel G. MacArthur,et al.  The ExAC browser: displaying reference data information from over 60 000 exomes , 2016, bioRxiv.

[20]  Philip Ginsbach,et al.  Copy Number Studies in Noisy Samples , 2013, Microarrays.

[21]  Robert A. Hegele,et al.  Copy Number Variation in the Human Genome and Its Implications for Cardiovascular Disease , 2007, Circulation.

[22]  Zhongming Zhao,et al.  CNVannotator: A Comprehensive Annotation Server for Copy Number Variation in the Human Genome , 2013, PloS one.

[23]  Lars Feuk,et al.  The Database of Genomic Variants: a curated collection of structural variation in the human genome , 2013, Nucleic Acids Res..

[24]  J. R. MacDonald,et al.  A copy number variation map of the human genome , 2015, Nature Reviews Genetics.

[25]  G. Getz,et al.  GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers , 2011, Genome Biology.

[26]  Chunquan Li,et al.  CNVD: Text mining‐based copy number variation in disease database , 2012, Human mutation.

[27]  S. Gabriel,et al.  Pan-cancer patterns of somatic copy-number alteration , 2013, Nature Genetics.

[28]  Louise V Wain,et al.  Genomic copy number variation, human health, and disease , 2009, The Lancet.

[29]  Nallur B Ramachandra,et al.  Type 2 diabetes mellitus disease risk genes identified by genome wide copy number variation scan in normal populations. , 2016, Diabetes research and clinical practice.

[30]  Adam A. Margolin,et al.  The Cancer Cell Line Encyclopedia enables predictive modeling of anticancer drug sensitivity , 2012, Nature.

[31]  E. Larsson,et al.  FocalScan: Scanning for altered genes in cancer based on coordinated DNA and RNA change , 2016, Nucleic acids research.

[32]  Joshua M. Korn,et al.  High-throughput screening using patient-derived tumor xenografts to predict clinical trial drug response , 2015, Nature Medicine.

[33]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[34]  A. Oberg,et al.  In vivo anti-tumor activity of the PARP inhibitor niraparib in homologous recombination deficient and proficient ovarian carcinoma. , 2016, Gynecologic oncology.