Mutational Data Loading Routines for Human Genome Databases: the BRCA1 Case

The last decades a large amount of research has been done in the genomics domain which has and is generating terabytes, if not exabytes, of information stored globally in a very fragmented way. Different databases use different ways of storing the same data, resulting in undesired redundancy and restrained information transfer. Adding to this, keeping the existing databases consistent and data integrity maintained is mainly left to human intervention which in turn is very costly, both in time and money as well as error prone. Identifying a fixed conceptual dictionary in the form of a conceptual model thus seems crucial. This paper presents an effort to integrate the mutational data from the established genomic data source HGMD into a conceptual model driven database HGDB, thereby providing useful lessons to improve the already existing conceptual model of the human genome.

[1]  Andrei Zinovyev,et al.  How much non-coding DNA do eukaryotes require? , 2006, Journal of theoretical biology.

[2]  K Offit,et al.  BRCA1 sequence analysis in women at high risk for susceptibility mutations. Risk factor analysis and implications for genetic testing. , 1997, JAMA.

[3]  Tatiana Tatusova,et al.  NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..

[4]  Steffen Schulze-Kremer,et al.  The Ontology of the Gene Ontology , 2003, AMIA.

[5]  C. Osborne,et al.  Aberrant Subcellular Localization of BRCA1 in Breast Cancer , 1995, Science.

[6]  H. Griffin,et al.  The European Bioinformatics Institute , 1995 .

[7]  J. D. Thompson,et al.  Germ-line BRCA1 mutations in selected men with prostate cancer. , 1996, American journal of human genetics.

[8]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[9]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[10]  H. Meijers-Heijboer,et al.  A DGGE system for comprehensive mutation screening of BRCA1 and BRCA2: application in a Dutch cancer clinic setting , 2006, Human mutation.

[11]  M. Crawford The Human Genome Project. , 1990, Human biology.

[12]  Oscar Pastor,et al.  Conceptual Modeling of Human Genome Mutations - A Dichotomy Between what we Have and What we Should Have , 2010, BIOINFORMATICS.

[13]  D. Valle,et al.  Online Mendelian Inheritance In Man (OMIM) , 2000, Human mutation.

[14]  M Graves,et al.  A graph conceptual model for developing human genome center databases , 1996, Comput. Biol. Medicine.

[15]  Lu Lu,et al.  The genetic structure of recombinant inbred mice: high-resolution consensus maps for complex trait analysis , 2001, Genome Biology.

[16]  Alan F. Scott,et al.  McKusick's Online Mendelian Inheritance in Man (OMIM®) , 2008, Nucleic Acids Res..

[17]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[18]  Steven Gallinger,et al.  Germline BRCA1 mutations predispose to pancreatic adenocarcinoma , 2008, Human Genetics.

[19]  I. Fokkema,et al.  LOVD: Easy creation of a locus‐specific sequence variation database using an “LSDB‐in‐a‐box” approach , 2005, Human mutation.

[20]  Sue Povey,et al.  The HUGO Gene Nomenclature Database, 2006 updates , 2005, Nucleic Acids Res..

[21]  Akif Uzman,et al.  Essential cell biology (2nd ed.) , 2004 .

[22]  William E. Jones,et al.  The promise of genetics , 2002 .

[23]  M. Skolnick,et al.  BRCA1 mutations in primary breast and ovarian carcinomas. , 1994, Science.

[24]  Takuro Tamura,et al.  Formal design and implementation of an improved DDBJ DNA database with a new schema and object-oriented library , 1998, Bioinform..

[25]  M. King,et al.  Linkage of early-onset familial breast cancer to chromosome 17q21. , 1990, Science.

[26]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[27]  Reiner Hartenstein,et al.  CANCER RESEARCH AND TREATMENT , 1932 .

[28]  J. Mattick Challenging the dogma: the hidden layer of non-protein-coding RNAs in complex organisms. , 2003, BioEssays : news and reviews in molecular, cellular and developmental biology.

[29]  Giovanni Parmigiani,et al.  Meta-analysis of BRCA1 and BRCA2 penetrance. , 2007, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[30]  C B Begg,et al.  The lifetime risks of breast cancer in Ashkenazi Jewish carriers of BRCA1 and BRCA2 mutations. , 2001, Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology.

[31]  P. Stenson,et al.  Human Gene Mutation Database (HGMD®): 2003 update , 2003, Human mutation.

[32]  Zhongming Zhao,et al.  Investigating single nucleotide polymorphism (SNP) density in the human genome and its implications for molecular evolution. , 2003, Gene.

[33]  L. Stein Creating a bioinformatics nation , 2002, Nature.

[34]  John S Mattick,et al.  The hidden genetic program of complex organisms. , 2004, Scientific American.

[35]  Ashish K. Mandal,et al.  Novel germline mutations in breast cancer susceptibility genes BRCA1, BRCA2 and p53 gene in breast cancer patients from India , 2004, Breast Cancer Research and Treatment.

[36]  Oscar Pastor,et al.  Conceptual Modeling Meets the Human Genome , 2008, ER.

[37]  B. Graveley Alternative splicing: increasing diversity in the proteomic world. , 2001, Trends in genetics : TIG.

[38]  Oscar Pastor,et al.  Model-driven architecture in practice - a software production environment based on conceptual modeling , 2007 .

[39]  H. Lehrach,et al.  A Human Protein-Protein Interaction Network: A Resource for Annotating the Proteome , 2005, Cell.

[40]  R. Modali,et al.  BRCA1 mutations in African Americans , 1999, Human Genetics.

[41]  Toshihiro Tanaka The International HapMap Project , 2003, Nature.

[42]  Oscar Pastor,et al.  Enforcing Conceptual Modeling to improve the understanding of human genome , 2010, 2010 Fourth International Conference on Research Challenges in Information Science (RCIS).

[43]  Jing Lu,et al.  Mutational analysis of BRCA1 and BRCA2 genes in Chinese ovarian cancer identifies 6 novel germline mutations , 2000, Human mutation.

[44]  J. Mattick RNA regulation: a new genetics? , 2004, Nature Reviews Genetics.

[45]  Olufunmilayo I. Olopade,et al.  Prevalence of BRCA1 and BRCA2 mutations among clinic-based African American families with breast cancer , 2000, Human Genetics.

[46]  M. Stratton,et al.  The COSMIC (Catalogue of Somatic Mutations in Cancer) database and website , 2004, British Journal of Cancer.

[47]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[48]  Alexander Pertsemlidis,et al.  Having a BLAST with bioinformatics (and avoiding BLASTphemy) , 2001, Genome Biology.

[49]  Joerg Evermann,et al.  Ontology based object-oriented domain modelling: fundamental concepts , 2005, Requirements Engineering.