论文信息 - Multi-genome alignment for quality control and contamination screening of next-generation sequencing data

Multi-genome alignment for quality control and contamination screening of next-generation sequencing data

The availability of massive amounts of DNA sequence data, from 1000s of genomes even in a single project has had a huge impact on our understanding of biology, but also creates several problems for biologists carrying out those experiments. Bioinformatic analysis of sequence data is perhaps the most obvious challenge but upstream of this even basic quality control of sequence run performance is challenging for many users given the volume of data. Users need to be able to assess run quality efficiently so that only high-quality data are passed through to computationally-, financially-, and time-intensive processes. There is a clear need to make human review of sequence data as efficient as possible. The multi-genome alignment tool presented here presents next-generation sequencing run data in visual and tabular formats simplifying assessment of run yield and quality, as well as presenting some sample-based quality metrics and screening for contamination from adapter sequences and species other than the one being sequenced.

James Hadfield | Matthew D. Eldridge | J. Hadfield | M. Eldridge

[1] S. Lindquist,et al. Rnq1: an epigenetic modifier of protein function in yeast. , 2000, Molecular cell.

[2] Ewan Birney,et al. Automated generation of heuristics for biological sequence comparison , 2005, BMC Bioinformatics.

[3] M S Waterman,et al. Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[4] A. Gnirke,et al. Reduced representation bisulfite sequencing for comparative high-resolution DNA methylation analysis , 2005, Nucleic acids research.

[5] Cole Trapnell,et al. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[6] Wei-Min Liu,et al. Robust estimators for expression analysis , 2002, Bioinform..

[7] O. Gotoh. An improved algorithm for matching biological sequences. , 1982, Journal of molecular biology.

[8] P. Green,et al. Base-calling of automated sequencer traces using phred. I. Accuracy assessment. , 1998, Genome research.

[9] Timothy Daley,et al. Predicting the molecular complexity of sequencing libraries , 2013, Nature Methods.