VCF-Miner: GUI-based application for mining variants and annotations stored in VCF files

Next-generation sequencing platforms are widely used to discover variants associated with disease. The processing of sequencing data involves read alignment, variant calling, variant annotation and variant filtering. The standard file format to hold variant calls is the variant call format (VCF) file. According to the format specifications, any arbitrary annotation can be added to the VCF file for downstream processing. However, most downstream analysis programs disregard annotations already present in the VCF and re-annotate variants using the annotation provided by that particular program. This precludes investigators who have collected information on variants from literature or other sources from including these annotations in the filtering and mining of variants. We have developed VCF-Miner, a graphical user interface-based stand-alone tool, to mine variants and annotation stored in the VCF. Powered by a MongoDB database engine, VCF-Miner enables the stepwise trimming of non-relevant variants. The grouping feature implemented in VCF-Miner can be used to identify somatic variants by contrasting variants in tumor and in normal samples or to identify recessive/dominant variants in family studies. It is not limited to human data, but can also be extended to include non-diploid organisms. It also supports copy number or any other variant type supported by the VCF specification. VCF-Miner can be used on a personal computer or large institutional servers and is freely available for download from http://bioinformaticstools.mayo.edu/research/vcf-miner/.

[1]  L. P. Zhao,et al.  Assessing familial aggregation of age at onset, by using estimating equations, with application to breast cancer. , 1996, American journal of human genetics.

[2]  P. Bork,et al.  A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[3]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[4]  H. Hakonarson,et al.  ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data , 2010, Nucleic acids research.

[5]  Daniel Rios,et al.  Bioinformatics Applications Note Databases and Ontologies Deriving the Consequences of Genomic Variants with the Ensembl Api and Snp Effect Predictor , 2022 .

[6]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[7]  A. Gonzalez-Perez,et al.  Improving the assessment of the outcome of nonsynonymous SNVs with a consensus deleteriousness score, Condel. , 2011, American journal of human genetics.

[8]  Gonçalo R. Abecasis,et al.  The variant call format and VCFtools , 2011, Bioinform..

[9]  Pablo Cingolani,et al.  Using Drosophila melanogaster as a Model for Genotoxic Chemical Mutational Studies with a New Program, SnpSift , 2012, Front. Gene..

[10]  Kenny Q. Ye,et al.  An integrated map of genetic variation from 1,092 human genomes , 2012, Nature.

[11]  Pablo Cingolani,et al.  © 2012 Landes Bioscience. Do not distribute. , 2022 .

[12]  Vanessa E. Gray,et al.  Evolutionary diagnosis method for variants in personal exomes , 2012, Nature Methods.

[13]  P. Stenson,et al.  The Human Gene Mutation Database (HGMD) and Its Exploitation in the Fields of Personalized Genomics and Molecular Evolution , 2012, Current protocols in bioinformatics.

[14]  Aaron R. Quinlan,et al.  GEMINI: Integrative Exploration of Genetic Variation and Genome Annotations , 2013, PLoS Comput. Biol..

[15]  Raymond M. Moore,et al.  SoftSearch: Integration of Multiple Sequence Features to Identify Breakpoints of Structural Variations , 2013, PloS one.

[16]  Jean-Baptiste Cazier,et al.  Choice of transcripts and software has a large effect on variant annotation , 2014, Genome Medicine.

[17]  I. Touitou,et al.  Response to Li and Zhang: infevers, a human gene mutation database for autoinflammatory diseases including disseminated superficial actinic porokeratosis. , 2014, Journal of dermatological science.

[18]  Steven N. Hart,et al.  The Biological Reference Repository (BioR): a rapid and flexible system for genomics annotation , 2014, Bioinform..

[19]  Francisco Salavert,et al.  A web-based interactive framework to assist in the prioritization of disease candidate genes in whole-exome sequencing studies , 2014, Nucleic Acids Res..