Improving sequence variant descriptions in mutation databases and literature using the Mutalyzer sequence variation nomenclature checker

Unambiguous and correct sequence variant descriptions are of utmost importance, not in the least since mistakes and uncertainties may lead to undesired errors in clinical diagnosis. We developed the Mutation Analyzer (Mutalyzer) sequence variation nomenclature checker (www.lovd.nl/mutalyzer; last accessed 13 September 2007) for automated analysis and correction of sequence variant descriptions using reference sequences from any organism. Mutalyzer handles most variation types: substitution, deletion, duplication, insertion, indel, and splice‐site changes following current recommendations of the Human Genome Variation Society (HGVS). Input is a GenBank accession number or an uploaded reference sequence file in GenBank format with user‐modified annotation, an HGNC gene symbol, and the variant (single or in a batch file). Mutalyzer generates variant descriptions at DNA level, the level of all annotated transcripts and the deduced outcome at protein level. To validate Mutalyzer's performance and to investigate the sequence variant description quality in locus‐specific mutation databases (LSDBs), more than 11,000 variants in the PAH, BIC BRCA2, and HbVar databases were analyzed, showing that 87%, 25%, and 38%, respectively, were error‐free and following the recommendations. Low recognition rates in BIC and HbVar (38% and 51%, respectively) were due to lack of a well‐annotated genomic reference sequence (HbVar) or noncompliance to the guidelines (BRCA2). Provided with well‐annotated genomic reference sequences, Mutalyzer is very effective for the curation of newly discovered sequence variation descriptions and existing LSDB data. Mutalyzer will be linked to the Leiden Open source Variation Database (LOVD) (www.LOVD.nl; last accessed 13 September 2007) and is the first module of a sequence variant effect prediction package. Hum Mutat 29(1), 6–13, 2008. © 2007 Wiley‐Liss, Inc.

[1]  J. D. den Dunnen,et al.  Standardizing mutation nomenclature: Why bother? , 2003, Human mutation.

[2]  P. Stenson,et al.  Human Gene Mutation Database (HGMD®): 2003 update , 2003, Human mutation.

[3]  R. Cotton Recommendations of the 2006 Human Variome Project meeting , 2007, Nature Genetics.

[4]  S. Antonarakis Recommendations for a nomenclature system for human gene mutations , 1998 .

[5]  I. Fokkema,et al.  LOVD: Easy creation of a locus‐specific sequence variation database using an “LSDB‐in‐a‐box” approach , 2005, Human mutation.

[6]  George P Patrinos,et al.  HbVar: A relational database of human hemoglobin variants and thalassemia mutations at the globin gene server , 2002, Human mutation.

[7]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[8]  D. Fredman,et al.  HGVbase: a curated resource describing human DNA variation and phenotype relationships , 2004, Nucleic Acids Res..

[9]  C R Scriver,et al.  PAHdb: A locus‐specific knowledgebase , 2000, Human mutation.

[10]  Mathew W. Wright,et al.  Guidelines for human gene nomenclature. , 2002, Genomics.

[11]  Alan F. Scott,et al.  Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders , 2004, Nucleic Acids Res..

[12]  Ourania Horaitis,et al.  Time for a unified system of mutation description and reporting: a review of locus-specific mutation databases. , 2002, Genome research.

[13]  S. Antonarakis,et al.  Mutation nomenclature extensions and suggestions to describe complex mutations: A discussion , 2000 .