Human Variome Project Quality Assessment Criteria for Variation Databases

Numerous databases containing information about DNA, RNA, and protein variations are available. Gene‐specific variant databases (locus‐specific variation databases, LSDBs) are typically curated and maintained for single genes or groups of genes for a certain disease(s). These databases are widely considered as the most reliable information source for a particular gene/protein/disease, but it should also be made clear they may have widely varying contents, infrastructure, and quality. Quality is very important to evaluate because these databases may affect health decision‐making, research, and clinical practice. The Human Variome Project (HVP) established a Working Group for Variant Database Quality Assessment. The basic principle was to develop a simple system that nevertheless provides a good overview of the quality of a database. The HVP quality evaluation criteria that resulted are divided into four main components: data quality, technical quality, accessibility, and timeliness. This report elaborates on the developed quality criteria and how implementation of the quality scheme can be achieved. Examples are provided for the current status of the quality items in two different databases, BTKbase, an LSDB, and ClinVar, a central archive of submissions about variants and their clinical significance.

[1]  Gerard C. P. Schaafsma,et al.  VariOtator, a Software Tool for Variation Annotation with the Variation Ontology , 2016, Human mutation.

[2]  Birgit Funke,et al.  College of American Pathologists' laboratory standards for next-generation sequencing clinical tests. , 2015, Archives of pathology & laboratory medicine.

[3]  Mauno Vihinen,et al.  Standard development at the Human Variome Project , 2015, Database J. Biol. Databases Curation.

[4]  M. Vihinen Types and effects of protein variations , 2015, Human Genetics.

[5]  Elspeth A. Bruford,et al.  Genenames.org: the HGNC resources in 2015 , 2014, Nucleic Acids Res..

[6]  Gerard C. P. Schaafsma,et al.  Genetic Variation in Bruton Tyrosine Kinase , 2015 .

[7]  Mauno Vihinen,et al.  Variation ontology: annotator guide , 2014, J. Biomed. Semant..

[8]  M. Vihinen Variation Ontology for annotation of variation effects and mechanisms , 2014, Genome research.

[9]  Deanna M. Church,et al.  ClinVar: public archive of relationships among sequence variation and human phenotype , 2013, Nucleic Acids Res..

[10]  Mauno Vihinen,et al.  Human variome project country nodes: Documenting genetic information within a country , 2012, Human mutation.

[11]  A. Beaudet,et al.  Disease‐specific databases: Why we need them and some recommendations from the Human Variome Project Meeting, May 28, 2011 , 2012, American journal of medical genetics. Part A.

[12]  Phillip W. Lord,et al.  An approach to describing and analysing bulk biological annotation quality: a case study using UniProtKB , 2012, Bioinform..

[13]  Rui Wang,et al.  PRIDE: Quality control in a proteomics data repository , 2012, Database J. Biol. Databases Curation.

[14]  Mauno Vihinen,et al.  Curating gene variant databases (LSDBs): Toward a universal standard , 2012, Human mutation.

[15]  Mauno Vihinen,et al.  Guidelines for establishing locus specific databases , 2012, Human mutation.

[16]  Jeroen F. J. Laros,et al.  LOVD v.2.0: the next generation in gene variant databases , 2011, Human mutation.

[17]  George P Patrinos,et al.  Recommendations for Genetic Variation Data Capture in Developing Countries to Ensure a Comprehensive Worldwide Data Collection , 2010, Human mutation.

[18]  Tin Wee Tan,et al.  Towards BioDBcore: a community-defined information specification for biological databases , 2010, Database J. Biol. Databases Curation.

[19]  Sue Povey,et al.  How to catch all those mutations—the report of the Third Human Variome Project Meeting, UNESCO Paris, May 2010 , 2010, Human mutation.

[20]  H. Firth,et al.  Practical guidelines addressing ethical issues pertaining to the curation of human locus-specific variation databases (LSDBs) , 2010, Human mutation.

[21]  Michael Zouberakis,et al.  Finding and sharing: new approaches to registries of databases and services for the biomedical sciences , 2010, Database J. Biol. Databases Curation.

[22]  R. E. Tully,et al.  Locus Reference Genomic sequences: an improved basis for describing human DNA variants , 2010, Genome Medicine.

[23]  Mauno Vihinen,et al.  Capturing all disease-causing mutations for clinical and research use: Toward an effortless system for the Human Variome Project , 2009, Genetics in Medicine.

[24]  Sue Povey,et al.  Sharing data between LSDBs and central repositories , 2009, Human mutation.

[25]  Toshio Kojima,et al.  Planning the Human Variome Project: The Spain report , 2009, Human mutation.

[26]  L. Shaffer,et al.  ISCN 2009 - An International System for Human Cytogenetic Nomenclature , 2009 .

[27]  P. Robinson,et al.  The Human Phenotype Ontology: a tool for annotating and analyzing human hereditary disease. , 2008, American journal of human genetics.

[28]  Johan T den Dunnen,et al.  Improving sequence variant descriptions in mutation databases and literature using the Mutalyzer sequence variation nomenclature checker , 2008, Human mutation.

[29]  M. Vihinen,et al.  Recommendations for locus‐specific databases and their curation , 2008 .

[30]  Mauno Vihinen,et al.  IDR knowledge base for primary immunodeficiencies , 2007, Immunome research.

[31]  M. Vihinen,et al.  BTKbase: the mutation database for X‐linked agammaglobulinemia , 2006, Human mutation.

[32]  John A. Hoxmeier Typology of database quality factors , 1998, Software Quality Journal.

[33]  John R. Rumble,et al.  The essentials of a database quality process , 2003, Data Sci. J..

[34]  S. Antonarakis,et al.  Nomenclature for the description of human sequence variations , 2001, Human Genetics.

[35]  C Béroud,et al.  UMD (Universal Mutation Database): A generic software to build and analyze locus‐specific databases , 2000, Human mutation.

[36]  Mauno Vihinen,et al.  MUTbase: maintenance and analysis of distributed mutation databases , 1999, Bioinform..

[37]  M. Vihinen,et al.  Immunodeficiency mutation databases (IDbases). , 1998, Human mutation.

[38]  Mauno Vihinen,et al.  BTKbase, mutation database for X-linked agammaglobulinemia (XLA) , 1998, Nucleic Acids Res..

[39]  Marc Rittberger,et al.  Measuring quality in the production of databases , 1997, J. Inf. Sci..

[40]  Diane M. Strong,et al.  Beyond Accuracy: What Data Quality Means to Data Consumers , 1996, J. Manag. Inf. Syst..

[41]  Mauno Vihinen,et al.  BTKbase, mutation database for X-linked agammaglobulinemia (XLA) , 1996, Nucleic Acids Res..

[42]  M. Vihinen BTKbase: a database of XLA-causing mutations , 1995 .

[43]  M. Vihinen,et al.  BTKbase: a database of XLA-causing mutations. International Study Group. , 1995, Immunology today.

[44]  Ekkehard Fluck,et al.  Criteria of Quality Assessment for Scientific Databases , 1994, Journal of chemical information and computer sciences.

[45]  D. Bentley,et al.  The gene involved in X-linked agammaglobulinaemia is a member of the src family of protein-tyrosine kinases , 1993, Nature.

[46]  D. Vetrie,et al.  The gene involved in X-linked agammaglobulinaemia is a member of the src family of protein-tyrosine kinases , 1993, Nature.