Indexcov: fast coverage quality control for whole-genome sequencing

Abstract The BAM and CRAM formats provide a supplementary linear index that facilitates rapid access to sequence alignments in arbitrary genomic regions. Comparing consecutive entries in a BAM or CRAM index allows one to infer the number of alignment records per genomic region for use as an effective proxy of sequence depth in each genomic region. Based on these properties, we have developed indexcov, an efficient estimator of whole-genome sequencing coverage to rapidly identify samples with aberrant coverage profiles, reveal large-scale chromosomal anomalies, recognize potential batch effects, and infer the sex of a sample. Indexcov is available at https://github.com/brentp/goleft under the MIT license.

[1]  Y. Benjamini,et al.  Summarizing and correcting the GC content bias in high-throughput sequencing , 2012, Nucleic acids research.

[2]  Bradley P. Coe,et al.  Genome structural variation discovery and genotyping , 2011, Nature Reviews Genetics.

[3]  J. Giedd,et al.  Dosage effects of X and Y chromosomes on language and social functioning in children with supernumerary sex chromosome aneuploidies: implications for idiopathic language impairment and autism spectrum disorders. , 2012, Journal of child psychology and psychiatry, and allied disciplines.

[4]  Brent S. Pedersen,et al.  bíogo/hts: high throughput sequence handling for the Go language , 2017, J. Open Source Softw..

[5]  Aaron R. Quinlan,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2022 .

[6]  David M. Simcha,et al.  Tackling the widespread and critical impact of batch effects in high-throughput data , 2010, Nature Reviews Genetics.

[7]  Ryan L. Collins,et al.  Limited contribution of rare, noncoding variation to autism spectrum disorder from sequencing of 2,076 genomes in quartet families , 2017, bioRxiv.

[8]  Ryan M. Layer,et al.  LUMPY: a probabilistic framework for structural variant discovery , 2012, Genome Biology.

[9]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[10]  Alison M. Meynert,et al.  Quantifying single nucleotide variant detection sensitivity in exome sequencing , 2013, BMC Bioinformatics.

[11]  Heng Li,et al.  Tabix: fast retrieval of sequence features from generic TAB-delimited files , 2011, Bioinform..

[12]  Heng Li,et al.  Toward better understanding of artifacts in variant calling from high-coverage samples , 2014, Bioinform..

[13]  Gabor T Marth,et al.  bam.iobio: a web-based, real-time, sequence alignment file inspector , 2014, Nature Methods.

[14]  Christopher S. Poultney,et al.  Insights into Autism Spectrum Disorder Genomic Architecture and Biology from 71 Risk Loci , 2015, Neuron.