FEATnotator: A tool for integrated annotation of sequence features and variation, facilitating interpretation in genomics experiments.

As approaches are sought for more efficient and democratized uses of non-model and expanded model genomics references, ease of integration of genomic feature datasets is especially desirable in multidisciplinary research communities. Valuable conclusions are often missed or slowed when researchers refer experimental results to a single reference sequence that lacks integrated pan-genomic and multi-experiment data in accessible formats. Association of genomic positional information, such as results from an expansive variety of next-generation sequencing experiments, with annotated reference features such as genes or predicted protein binding sites, provides the context essential for conclusions and ongoing research. When the experimental system includes polymorphic genomic inputs, rapid calculation of gene structural and protein translational effects of sequence variation from the reference can be invaluable. Here we present FEATnotator, a lightweight, fast and easy to use open source software program that integrates and reports overlap and proximity in genomic information from any user-defined datasets including those from next generation sequencing applications. We illustrate use of the tool by summarizing whole genome sequence variation of a widely used natural isolate of Arabidopsis thaliana in the context of gene models of the reference accession. Previous discovery of a protein coding deletion influencing root development is replicated rapidly. Appropriate even in investigations of a single gene or genic regions such as QTL, comprehensive reports provided by FEATnotator better prepare researchers for interpretation of their experimental results. The tool is available for download at http://featnotator.sourceforge.net.

[1]  Hanlee P. Ji,et al.  Next-generation DNA sequencing , 2008, Nature Biotechnology.

[2]  Ralph S. Quatrano,et al.  A Dwarf Mutant of Arabidopsis Generated by T-DNA Insertion Mutagenesis , 1989, Science.

[3]  Yongsheng Bai,et al.  SNPAAMapper: An efficient genome-wide SNP variant analysis pipeline for next-generation sequencing data , 2013, Bioinformation.

[4]  Xiaoyu Zhang,et al.  CHH islands: de novo DNA methylation in near-gene chromatin regulation in maize , 2013, Genome research.

[5]  Zhulong Chan,et al.  Transcriptional variation in response to salt stress in commonly used Arabidopsis thaliana accessions. , 2013, Plant physiology and biochemistry : PPB.

[6]  Allen D. Delaney,et al.  Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing , 2007, Nature Methods.

[7]  Vladimir Makarov,et al.  AnnTools: a comprehensive and versatile annotation toolkit for genomic variants , 2012, Bioinform..

[8]  Lin Fang,et al.  Resequencing 50 accessions of cultivated and wild rice yields markers for identifying agronomically important genes , 2011, Nature Biotechnology.

[9]  Å. Strid,et al.  Supplementary ultraviolet-B irradiation reveals differences in stress responses between Arabidopsis thaliana ecotypes. , 2006, Plant, cell & environment.

[10]  P. Pevzner,et al.  Gene recognition via spliced sequence alignment. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Aaron R. Quinlan,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2022 .

[12]  Nicholas R. Lemoine,et al.  SNPnexus: a web server for functional annotation of novel and publicly known genetic variants (2012 update) , 2012, Nucleic Acids Res..

[13]  L. Gutierrez,et al.  A Novel Viable Allele of Arabidopsis CULLIN1 Identified in a Screen for Superroot2 Suppressors by Next Generation Sequencing-Assisted Mapping , 2014, PloS one.

[14]  Amy E. Hawkins,et al.  DNA sequencing of a cytogenetically normal acute myeloid leukemia genome , 2008, Nature.

[15]  Peter J. Bradbury,et al.  Maize HapMap2 identifies extant variation from a genome in flux , 2012, Nature Genetics.

[16]  James R. Knight,et al.  Genome sequencing in microfabricated high-density picolitre reactors , 2005, Nature.

[17]  James B. Brown,et al.  Diversity and dynamics of the Drosophila transcriptome , 2014, Nature.

[18]  Christopher J. Creevey,et al.  Snpdat: Easy and rapid annotation of results from de novo snp discovery projects for model and non-model organisms , 2013, BMC Bioinformatics.

[19]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[20]  Laxmi Parida,et al.  The genome sequence of the most widely cultivated cacao type and its use to identify candidate genes regulating pod color , 2013, Genome Biology.

[21]  M. Lercher,et al.  PopGenome: An Efficient Swiss Army Knife for Population Genomic Analyses in R , 2014, Molecular biology and evolution.

[22]  E. Mardis Next-generation DNA sequencing methods. , 2008, Annual review of genomics and human genetics.

[23]  Gabor T. Marth,et al.  Whole-genome sequencing and variant discovery in C. elegans , 2008, Nature Methods.

[24]  Yamile Marquez,et al.  Complexity of the Alternative Splicing Landscape in Plants[C][W][OPEN] , 2013, Plant Cell.

[25]  R. Lister,et al.  Highly Integrated Single-Base Resolution Maps of the Epigenome in Arabidopsis , 2008, Cell.

[26]  D. Bentley,et al.  Whole-genome re-sequencing. , 2006, Current opinion in genetics & development.

[27]  A. Mortazavi,et al.  Genome-Wide Mapping of in Vivo Protein-DNA Interactions , 2007, Science.

[28]  Peter Tarczy-Hornoch,et al.  SNPit: A federated data integration system for the purpose of functional SNP annotation , 2009, Comput. Methods Programs Biomed..

[29]  Raymond K. Auerbach,et al.  Integrative Analysis of the Caenorhabditis elegans Genome by the modENCODE Project , 2010, Science.

[30]  R. Last,et al.  Shotguns and SNPs: how fast and cheap sequencing is revolutionizing plant biology. , 2010, The Plant journal : for cell and molecular biology.

[31]  Dustin E. Schones,et al.  High-Resolution Profiling of Histone Methylations in the Human Genome , 2007, Cell.

[32]  Mark Gerstein,et al.  VAT: a computational framework to functionally annotate variants in personal genomes within a cloud-computing environment , 2012, Bioinform..

[33]  Bertram Ludäscher,et al.  Sole-Search: an integrated analysis program for peak detection and functional annotation using ChIP-seq data , 2009, Nucleic acids research.

[34]  N. Lau,et al.  Characterization of the piRNA Complex from Rat Testes , 2006, Science.

[35]  David S. Lapointe,et al.  ChIPpeakAnno: a Bioconductor package to annotate ChIP-seq and ChIP-chip data , 2010, BMC Bioinformatics.

[36]  Heng Li,et al.  Tabix: fast retrieval of sequence features from generic TAB-delimited files , 2011, Bioinform..

[37]  L. Ragni,et al.  A hyperactive quantitative trait locus allele of Arabidopsis BRX contributes to natural variation in root growth vigor , 2010, Proceedings of the National Academy of Sciences.

[38]  M. Gerstein,et al.  The Transcriptional Landscape of the Yeast Genome Defined by RNA Sequencing , 2008, Science.

[39]  Nicholas P. Tucker,et al.  Tool for rapid annotation of microbial SNPs (TRAMS): a simple program for rapid annotation of genomic variation in prokaryotes , 2013, Antonie van Leeuwenhoek.

[40]  A. Visel,et al.  ChIP-seq accurately predicts tissue-specific activity of enhancers , 2009, Nature.

[41]  I. Goodhead,et al.  Dynamic repertoire of a eukaryotic transcriptome surveyed at single-nucleotide resolution , 2008, Nature.

[42]  James C. Mullikin,et al.  A Defined Zebrafish Line for High-Throughput Genetics and Genomics: NHGRI-1 , 2014, Genetics.

[43]  Vipin T. Sreedharan,et al.  Multiple reference genomes and transcriptomes for Arabidopsis thaliana , 2011, Nature.

[44]  R. D. Hawkins,et al.  Methods for identifying higher-order chromatin structure. , 2012, Annual review of genomics and human genetics.

[45]  Rasiah Loganantharaj,et al.  PAVIS: a tool for Peak Annotation and Visualization , 2013, Bioinform..

[46]  Pablo Cingolani,et al.  © 2012 Landes Bioscience. Do not distribute. , 2022 .