Improved linkage analysis of Quantitative Trait Loci using bulk segregants unveils a novel determinant of high ethanol tolerance in yeast

BackgroundBulk segregant analysis (BSA) coupled to high throughput sequencing is a powerful method to map genomic regions related with phenotypes of interest. It relies on crossing two parents, one inferior and one superior for a trait of interest. Segregants displaying the trait of the superior parent are pooled, the DNA extracted and sequenced. Genomic regions linked to the trait of interest are identified by searching the pool for overrepresented alleles that normally originate from the superior parent. BSA data analysis is non-trivial due to sequencing, alignment and screening errors.ResultsTo increase the power of the BSA technology and obtain a better distinction between spuriously and truly linked regions, we developed EXPLoRA (EXtraction of over-rePresented aLleles in BSA), an algorithm for BSA data analysis that explicitly models the dependency between neighboring marker sites by exploiting the properties of linkage disequilibrium through a Hidden Markov Model (HMM).Reanalyzing a BSA dataset for high ethanol tolerance in yeast allowed reliably identifying QTLs linked to this phenotype that could not be identified with statistical significance in the original study. Experimental validation of one of the least pronounced linked regions, by identifying its causative gene VPS70, confirmed the potential of our method.ConclusionsEXPLoRA has a performance at least as good as the state-of-the-art and it is robust even at low signal to noise ratio’s i.e. when the true linkage signal is diluted by sampling, screening errors or when few segregants are available.

[1]  Detlef Weigel,et al.  SHOREmap: simultaneous mapping and mutation identification by deep sequencing , 2009, Nature Methods.

[2]  C. Ball,et al.  Genetic and physical maps of Saccharomyces cerevisiae. , 1997, Nature.

[3]  Gavin Sherlock,et al.  Bulk Segregant Analysis by High-Throughput Sequencing Reveals a Novel Xylose Utilization Gene from Saccharomyces cerevisiae , 2010, PLoS genetics.

[4]  Leopold Parts,et al.  Assessing the complex architecture of polygenic traits in diverged yeast populations , 2011, Molecular ecology.

[5]  Thomas E Wilson,et al.  Discovery of Mutations in Saccharomyces cerevisiae by Pooled Linkage Analysis and Whole-Genome Sequencing , 2010, Genetics.

[6]  Lieven Clement,et al.  Identification of novel causative genes determining the complex trait of high ethanol tolerance in yeast using pooled-segregant whole-genome sequence analysis , 2012, Genome research.

[7]  M. Gerstein,et al.  CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. , 2011, Genome research.

[8]  G. Liti,et al.  The rise of yeast population genomics. , 2011, Comptes rendus biologies.

[9]  Y. Benjamini,et al.  Quantitative Trait Loci Analysis Using the False Discovery Rate , 2005, Genetics.

[10]  Leonid Kruglyak,et al.  Dissection of genetically complex traits with extremely large pools of yeast segregants , 2010, Nature.

[11]  Daniel R. Richards,et al.  Dissecting the architecture of a quantitative trait locus in yeast , 2002, Nature.

[12]  S. Pratt,et al.  Population genomic analysis of outcrossing and recombination in yeast , 2006, Nature Genetics.

[13]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[14]  A. Steed,et al.  Bulk segregant analysis with molecular markers and its use for improving drought resistance in maize , 1999 .

[15]  J. Thevelein,et al.  Genetic mapping of quantitative phenotypic traits in Saccharomyces cerevisiae. , 2012, FEMS yeast research.

[16]  L. Jønson,et al.  Genome‐wide identification of genes required for growth of Saccharomyces cerevisiae under ethanol stress , 2006, Yeast.

[17]  Satoshi Natsume,et al.  Genome sequencing reveals agronomically important loci in rice using MutMap , 2012, Nature Biotechnology.

[18]  T. Glenn Field guide to next‐generation DNA sequencers , 2011, Molecular ecology resources.

[19]  I. Măndoiu,et al.  Towards accurate detection and genotyping of expressed variants from whole transcriptome sequencing data , 2011, BMC Genomics.

[20]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[21]  Peter McCourt,et al.  Next-generation mapping of Arabidopsis genes. , 2011, The Plant journal : for cell and molecular biology.

[22]  W. G. Hill,et al.  Linkage disequilibrium in finite populations , 1968, Theoretical and Applied Genetics.

[23]  J. Bonifacino,et al.  Genomic screen for vacuolar protein sorting genes in Saccharomyces cerevisiae. , 2002, Molecular biology of the cell.

[24]  David K. Gifford,et al.  High-resolution genetic mapping with pooled sequencing , 2012, BMC Bioinformatics.

[25]  Paul M. Magwene,et al.  The Statistics of Bulk Segregant Analysis Using Next Generation Sequencing , 2011, PLoS Comput. Biol..

[26]  B. Paw,et al.  Mutation mapping and identification by whole-genome sequencing , 2012, Genome research.

[27]  Alan M. Moses,et al.  Revealing the genetic structure of a trait by sequencing a population under selection. , 2011, Genome research.