myVCF: a desktop application for high‐throughput mutations data management

Summary Next‐generation sequencing technologies have become the most powerful tool to discover genetic variants associated with human diseases. Although the dramatic reductions in the costs facilitate the use in the wet‐lab and clinics, the huge amount of data generated renders their management by non‐expert researchers and physicians extremely difficult. Therefore, there is an urgent need of novel approaches and tools aimed at getting the ‘end‐users’ closer to the sequencing data, facilitating the access by non‐bioinformaticians, and to speed‐up the functional interpretation of genetic variants. We developed myVCF, a standalone, easy‐to‐use desktop application, which is based on a browser interface and is suitable for Windows, Mac and UNIX systems. myVCF is an efficient platform that is able to manage multiple sequencing projects created from VCF files within the system; stores genetic variants and samples genotypes from an annotated VCF files into a SQLite database; implements a flexible search engine for data exploration, allowing to query for chromosomal region, gene, single variant or dbSNP ID. Besides, myVCF generates a summary statistics report about mutations distribution across samples and across the genome/exome by aggregating the information within the VCF file. In summary, the myVCF platform allows end‐users without strong programming and bioinformatics skills to explore, query, visualize and export mutations data in a simple and straightforward way. Availability and implementation https://apietrelli.github.io/myVCF/ Contact pietrelli@ingm.org Supplementary information Supplementary data are available at Bioinformatics online.

[1]  F. Cunningham,et al.  The Ensembl Variant Effect Predictor , 2016, Genome Biology.

[2]  James Y. Zou Analysis of protein-coding genetic variation in 60,706 humans , 2015, Nature.

[3]  Peter N. Robinson,et al.  Phenotype-driven strategies for exome prioritization of human Mendelian disease genes , 2015, Genome Medicine.

[4]  Gonçalo R. Abecasis,et al.  The variant call format and VCFtools , 2011, Bioinform..

[5]  Joel Gelernter,et al.  The Role and Challenges of Exome Sequencing in Studies of Human Diseases , 2013, Front. Genet..

[6]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[7]  J. Svendsen,et al.  New population-based exome data are questioning the pathogenicity of previously cardiomyopathy-associated genetic variants , 2013, European Journal of Human Genetics.

[8]  Steven Henikoff,et al.  SIFT: predicting amino acid changes that affect protein function , 2003, Nucleic Acids Res..

[9]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[10]  S. Gabriel,et al.  Analysis of 6,515 exomes reveals a recent origin of most human protein-coding variants , 2012, Nature.

[11]  Raymond K. Auerbach,et al.  The real cost of sequencing: higher than you think! , 2011, Genome Biology.

[12]  M J Becich,et al.  Clinical integration of next-generation sequencing technology. , 2012, Clinics in laboratory medicine.

[13]  H. Hakonarson,et al.  ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data , 2010, Nucleic acids research.

[14]  Deanna M. Church,et al.  ClinVar: public archive of relationships among sequence variation and human phenotype , 2013, Nucleic Acids Res..

[15]  Andrew J. Hill,et al.  Analysis of protein-coding genetic variation in 60,706 humans , 2015, bioRxiv.