Variobox: Automatic Detection and Annotation of Human Genetic Variants

Triggered by the sequencing of the human genome, personalized medicine has been one of the fastest growing research areas in the last decade. Multiple software and hardware technologies have been developed by several projects, culminating in the exponential growth of genetic data. Considering the technological developments in this field, it is now fairly easy and inexpensive to obtain genetic profiles for unique individuals, such as those performed by several genetic analysis companies. The availability of computational tools that simplify genetic data analysis and the disclosure of biomedical evidences are of utmost importance. We present Variobox, a desktop tool to annotate, analyze, and compare human genes. Variobox obtains variant annotation data from WAVe, protein metadata annotations from Protein Data Bank, and sequences are obtained from Locus Reference Genomic or RefSeq databases. To explore the data, Variobox provides an advanced sequence visualization that enables agile navigation through genetic regions. DNA sequencing data can be compared with reference sequences retrieved from LRG or RefSeq records, identifying and automatically annotating new potential variants. These features and data, ranging from patient sequences to HGVS‐compliant variant descriptions, are combined in an intuitive interface to analyze genes and variants. Variobox is a Java application, available at http://bioinformatics.ua.pt/variobox.

[1]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[2]  G. Ginsburg,et al.  Personalized medicine: revolutionizing drug discovery and patient care. , 2001, Trends in biotechnology.

[3]  M. Clementi,et al.  Diagnosis of vascular Ehlers-Danlos syndrome in Italy: clinical findings and novel COL3A1 mutations. , 2011, Journal of dermatological science.

[4]  Raymond Dalgleish,et al.  The Human Collagen Mutation Database 1998 , 1998, Nucleic Acids Res..

[5]  J. T. Dunnen,et al.  Expanding the MTM1 mutational spectrum: novel variants including the first multi-exonic duplication and development of a locus-specific database , 2012, European Journal of Human Genetics.

[6]  Bernd Hamann,et al.  Phylo-VISTA: interactive visualization of multiple DNA sequence alignments , 2004, Bioinform..

[7]  M. Nei,et al.  MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. , 2007, Molecular biology and evolution.

[8]  R. E. Tully,et al.  Locus Reference Genomic sequences: an improved basis for describing human DNA variants , 2010, Genome Medicine.

[9]  Jeroen F. J. Laros,et al.  LOVD v.2.0: the next generation in gene variant databases , 2011, Human mutation.

[10]  P. Kwok,et al.  Human Variome Project: an international collaboration to catalogue human genetic variation. , 2006, Pharmacogenomics.

[11]  Lei Liu,et al.  DRUMS: A human disease related unique gene mutation search engine , 2011, Human mutation.

[12]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[13]  H. Hakonarson,et al.  ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data , 2010, Nucleic acids research.

[14]  C Béroud,et al.  UMD (Universal Mutation Database): A generic software to build and analyze locus‐specific databases , 2000, Human mutation.

[15]  José Luís Oliveira,et al.  WAVe: web analysis of the variome , 2011, Human mutation.

[16]  T. Plasterer,et al.  SEQMAN. Contig assembly. , 1997, Methods in molecular biology.

[17]  S. Antonarakis,et al.  Mutation nomenclature extensions and suggestions to describe complex mutations: A discussion , 2000 .

[18]  Angel Herráez,et al.  Biomolecules in the computer: Jmol to the rescue , 2006, Biochemistry and molecular biology education : a bimonthly publication of the International Union of Biochemistry and Molecular Biology.

[19]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[20]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[21]  Tom H. Pringle,et al.  The human genome browser at UCSC. , 2002, Genome research.

[22]  Modesto Orozco,et al.  PMUT: a web-based tool for the annotation of pathological mutations on proteins , 2005, Bioinform..

[23]  J. D. den Dunnen,et al.  Standardizing mutation nomenclature: Why bother? , 2003, Human mutation.

[24]  J. D. den Dunnen,et al.  Development of NIPBL Locus‐Specific Database Using LOVD: From Novel Mutations to Further Genotype–Phenotype Correlations in Cornelia de Lange Syndrome , 2010, Human mutation.

[25]  Johan T den Dunnen,et al.  Describing structural changes by extending HGVS sequence variation nomenclature , 2011, Human mutation.

[26]  Michael J. Lush,et al.  genenames.org: the HGNC resources in 2011 , 2010, Nucleic Acids Res..

[27]  Rodrigo Lopez,et al.  Clustal W and Clustal X version 2.0 , 2007, Bioinform..

[28]  Tatiana Tatusova,et al.  NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..