Whole genome re-sequencing : lessons from unmapped reads

Unmapped reads are often discarded from the analysis of whole genome re-sequencing, while, opposingly, new biological information can be discovered from their analysis. In this pa- per, we investigated these reads from the re-sequencing data of thirty-three aphid genomes. The unmapped reads for each individual were retrieved from the results of the mapping of the sets of reads against the Acyrthosyphon Pisum reference genome, its mitochondrion genome and several known or putative symbiont genomes. These sets of unmapped reads were then cross-compared, this pointed out that a significant number of these sequences were conserved among individuals, especially when the latter are adapted to a same specific host plant, revealing that they may share crucial and functional material. Moreover, the analysis of the contigs resulting from the assem- blies of the unmapped reads gathered by biotype allowed us to discover putative novel sequences absent from the reference genomes and highlighted the possible presence of other symbionts in the pea aphid genome whose existence were not known previously. As a conclusion, this study emphasizes that using a default strategy (e.g for the mapping) may lead to the loss of important information, and must be accompanied by specific analyses depending on the biological model.

[1]  Robert A. Edwards,et al.  Quality control and preprocessing of metagenomic datasets , 2011, Bioinform..

[2]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[3]  S. Via SPECIALIZED HOST PLANT PERFORMANCE OF PEA APHID CLONES IS NOT ALTERED BY EXPERIENCE , 1991 .

[4]  H. Godfray,et al.  Effects of bacterial secondary symbionts on host plant use in pea aphids , 2011, Proceedings of the Royal Society B: Biological Sciences.

[5]  Rayan Chikhi,et al.  Space-Efficient and Exact de Bruijn Graph Representation Based on a Bloom Filter , 2012, WABI.

[6]  Joshua S. Paul,et al.  Genotype and SNP calling from next-generation sequencing data , 2011, Nature Reviews Genetics.

[7]  B. Sabater-Muñoz,et al.  Host–based divergence in populations of the pea aphid: insights from nuclear markers and the prevalence of facultative symbionts , 2003, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[8]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[9]  T. Fukatsu,et al.  Host Plant Specialization Governed by Facultative Symbiont , 2004, Science.

[10]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[11]  Dominique Lavenier,et al.  Compareads: comparing huge metagenomic experiments , 2012, BMC Bioinformatics.

[12]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[13]  J. Peccoud,et al.  A continuum of genetic divergence from sympatric host races to species in the pea aphid complex , 2009, Proceedings of the National Academy of Sciences.

[14]  G. K. Davis,et al.  Genome Sequence of the Pea Aphid Acyrthosiphon pisum , 2010, PLoS biology.