A scalable, aggregated genotypic–phenotypic database for human disease variation

Abstract Next generation sequencing multi-gene panels have greatly improved the diagnostic yield and cost effectiveness of genetic testing and are rapidly being integrated into the clinic for hereditary cancer risk. With this technology comes a dramatic increase in the volume, type and complexity of data. This invaluable data though is too often buried or inaccessible to researchers, especially to those without strong analytical or programming skills. To effectively share comprehensive, integrated genotypic–phenotypic data, we built Color Data, a publicly available, cloud-based database that supports broad access and data literacy. The database is composed of 50 000 individuals who were sequenced for 30 genes associated with hereditary cancer risk and provides useful information on allele frequency and variant classification, as well as associated phenotypic information such as demographics and personal and family history. Our user-friendly interface allows researchers to easily execute their own queries with filtering, and the results of queries can be shared and/or downloaded. The rapid and broad dissemination of these research results will help increase the value of, and reduce the waste in, scientific resources and data. Furthermore, the database is able to quickly scale and support integration of additional genes and human hereditary conditions. We hope that this database will help researchers and scientists explore genotype–phenotype correlations in hereditary cancer, identify novel variants for functional analysis and enable data-driven drug discovery and development.

[1]  M. King,et al.  Novel inherited mutations and variable expressivity of BRCA1 alleles, including the founder mutation 185delAG in Ashkenazi Jewish families. , 1995, American journal of human genetics.

[2]  Kenneth Offit,et al.  The carrier frequency of the BRCA2 6174delT mutation among Ashkenazi Jewish individuals is approximately 1% , 1996, Nature Genetics.

[3]  P. Hartge,et al.  The risk of cancer associated with specific mutations of BRCA1 and BRCA2 among Ashkenazi Jews. , 1997, The New England journal of medicine.

[4]  M Feychting,et al.  Cancer in twins: genetic and nongenetic familial risk factors. , 1997, Journal of the National Cancer Institute.

[5]  B. Weber,et al.  Founder BRCA1 and BRCA2 mutations in Ashkenazi Jews in Israel: frequency and differential penetrance in ovarian cancer and in breast-ovarian cancer families. , 1997, American journal of human genetics.

[6]  R. Sanson-Fisher,et al.  The accuracy of self-reported health behaviors and risk factors relating to cancer and cardiovascular disease in the general population11The full text of this article is available via AJPM Online at http://www.elsevier.com/locate/ajpmonline. , 1999 .

[7]  R W Sanson-Fisher,et al.  The accuracy of self-reported health behaviors and risk factors relating to cancer and cardiovascular disease in the general population: a critical review. , 1999, American journal of preventive medicine.

[8]  P Rozen,et al.  Prevalence of the I1307K APC gene variant in Israeli Jews of differing ethnic origin and risk for colorectal cancer. , 1999, Gastroenterology.

[9]  J. Kaprio,et al.  Environmental and heritable factors in the causation of cancer--analyses of cohorts of twins from Sweden, Denmark, and Finland. , 2000, The New England journal of medicine.

[10]  Päivi Heikkilä,et al.  CHEK2 variant I157T may be associated with increased breast cancer risk , 2004, International journal of cancer.

[11]  Charis Eng,et al.  Highly penetrant hereditary cancer syndromes , 2004, Oncogene.

[12]  Kenneth Offit,et al.  Functional and genomic approaches reveal an ancient CHEK2 allele associated with breast cancer in the Ashkenazi Jewish population. , 2005, Human molecular genetics.

[13]  S. Brandt-Rauf,et al.  Ashkenazi Jews and breast cancer: the consequences of linking ethnic identity to genetic disease. , 2006, American journal of public health.

[14]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[15]  Heng Li Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM , 2013, 1303.3997.

[16]  Ryan M. Layer,et al.  LUMPY: a probabilistic framework for structural variant discovery , 2012, Genome Biology.

[17]  Anh-Dao Nguyen,et al.  Clinical Genomic Database , 2013, Proceedings of the National Academy of Sciences.

[18]  Yuya Kobayashi,et al.  Clinical evaluation of a multiple-gene sequencing panel for hereditary cancer risk assessment. , 2014, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[19]  S. Narod,et al.  Breast-cancer risk in families with mutations in PALB2. , 2014, The New England journal of medicine.

[20]  Nazneen Rahman,et al.  Breast-cancer risk in families with mutations in PALB2. , 2014, The New England journal of medicine.

[21]  Tina Pesaran,et al.  Utilization of multigene panels in hereditary cancer predisposition testing: analysis of more than 2,000 patients , 2014, Genetics in Medicine.

[22]  Deanna M. Church,et al.  ClinVar: public archive of relationships among sequence variation and human phenotype , 2013, Nucleic Acids Res..

[23]  Bale,et al.  Standards and Guidelines for the Interpretation of Sequence Variants: A Joint Consensus Recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology , 2015, Genetics in Medicine.

[24]  Wendy K Chung,et al.  Pathogenic and likely pathogenic variant prevalence among the first 10,000 patients referred for next-generation cancer panel testing , 2015, Genetics in Medicine.

[25]  Eric Talevich,et al.  CNVkit: Genome-Wide Copy Number Detection and Visualization from Targeted DNA Sequencing , 2016, PLoS Comput. Biol..

[26]  Yuan Xue,et al.  Genetic evaluation and testing for hereditary forms of cancer in the era of next-generation sequencing@@@Genetic evaluation and testing for hereditary forms of cancer in the era of next-generation sequencing , 2016 .

[27]  James Y. Zou Analysis of protein-coding genetic variation in 60,706 humans , 2015, Nature.

[28]  Diana Torres,et al.  Founder and Recurrent Mutations in BRCA1 and BRCA2 Genes in Latin American Countries: State of the Art and Literature Review. , 2016, The oncologist.

[29]  J. Rajpura,et al.  Consistency between Self-Reported and Recorded Values for Clinical Measures , 2016, Cardiology research and practice.

[30]  Allison P. Heath,et al.  Toward a Shared Vision for Cancer Genomic Data. , 2016, The New England journal of medicine.

[31]  John Kidd,et al.  Frequency of Germline Mutations in 25 Cancer Susceptibility Genes in a Sequential Series of Patients With Breast Cancer. , 2016, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[32]  Yuan Xue,et al.  Genetic evaluation and testing for hereditary forms of cancer in the era of next-generation sequencing , 2016, Cancer biology & medicine.

[33]  Steven N. Hart,et al.  The contribution of pathogenic variants in breast cancer susceptibility genes to familial breast cancer risk , 2017, npj Breast Cancer.

[34]  Karen Y He,et al.  Big Data Analytics for Genomic Medicine , 2017, International journal of molecular sciences.

[35]  Beth Crawford,et al.  Multi-gene panel testing for hereditary cancer predisposition in unsolved high-risk breast and ovarian cancer patients , 2017, Breast Cancer Research and Treatment.

[36]  Robert Huether,et al.  Associations Between Cancer Predisposition Testing Panel Genes and Breast Cancer , 2017, JAMA oncology.

[37]  Sobia Raza,et al.  Genomic medicine and data sharing , 2017, British medical bulletin.

[38]  Jordan Lerner-Ellis,et al.  PALB2 mutations in high-risk women with breast or ovarian cancer. , 2017 .

[39]  Christopher P. Childers,et al.  National Distribution of Cancer Genetic Testing in the United States: Evidence for a Gender Disparity in Hereditary Breast and Ovarian Cancer , 2018, JAMA oncology.

[40]  Anjali D. Zimmer,et al.  Cascade Genetic Testing of Relatives for Hereditary Cancer Risk: Results of an Online Initiative , 2018, Journal of the National Cancer Institute.

[41]  Matthew P. Goetz,et al.  NCCN CLINICAL PRACTICE GUIDELINES IN ONCOLOGY , 2019 .