论文信息 - AGILE: an assembled genome mining pipeline

AGILE: an assembled genome mining pipeline

SUMMARY A number of limiting factors mean that traditional genome annotation tools either fail or perform sub-optimally when trying to detect coding sequences in poor quality genome assemblies/genome reports. This means that potentially useful data is accessible only to those with specific skills and expertise in assembly and annotation. We present an Assembled-Genome mIning pipeLinE (AGILE) written in Perl that combines bioinformatics tools with a number of steps to overcome the limitations imposed by such assemblies when applied to highly fragmented genomes. Our methodology uses user-specified query genes from a closely related species to mine and annotate coding sequences that would traditionally be missed by standard annotation packages. Despite a focus on mammalian genomes, the generalized implementation means that it may be applied to any genome assembly, providing a means for non-specialists to gather gene sequences for downstream analyses. AVAILABILITY AND IMPLEMENTATION Source code and associated files are available at: https://github.com/batlabucd/GenomeMining and https://bitbucket.org/BatlabUCD/genomemining/src. Singularity and Virtual Box images available at https://figshare.com/s/a0004bf93dc43484b0c0. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

Emma C. Teeling | Graham M. Hughes | E. Teeling | Graham M. Hughes

[1] Jordan A. Fish,et al. Ecological Patterns of nifH Genes in Four Terrestrial Climatic Zones Explored with Targeted Metagenomics Using FrameBot, a New Informatics Tool , 2013, mBio.

[2] M Thomas P Gilbert,et al. Bat Biology, Genomes, and the Bat1K Project: To Generate Chromosome-Level Genomes for All Living Bat Species. , 2018, Annual review of animal biosciences.

[3] Céline Scornavacca,et al. OrthoMaM v8: a database of orthologous exons and coding sequences for comparative genomics in mammals. , 2014, Molecular biology and evolution.

[4] Ewan Birney,et al. Automated generation of heuristics for biological sequence comparison , 2005, BMC Bioinformatics.

[5] Ian Korf,et al. Gene finding in novel genomes , 2004, BMC Bioinformatics.

[6] I. Longden,et al. EMBOSS: the European Molecular Biology Open Software Suite. , 2000, Trends in genetics : TIG.

[7] Burkhard Morgenstern,et al. AUGUSTUS: a web server for gene prediction in eukaryotes that allows user-defined constraints , 2005, Nucleic Acids Res..

[8] Sofia M. C. Robb,et al. MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes. , 2007, Genome research.

[9] Joshua M. Stuart,et al. Genome 10K: a proposal to obtain whole-genome sequence for 10,000 vertebrate species. , 2009, The Journal of heredity.

[10] Wen J. Li,et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation , 2015, Nucleic Acids Res..

[11] Douglas Thain,et al. Scaling up genome annotation using MAKER and work queue , 2014, Int. J. Bioinform. Res. Appl..