Transcriptome sequencing and high‐resolution melt analysis advance single nucleotide polymorphism discovery in duplicated salmonids

Until recently, single nucleotide polymorphism (SNP) discovery in nonmodel organisms faced many challenges, often depending upon a targeted‐gene approach and Sanger sequencing of many individuals. The advent of next‐generation sequencing technologies has dramatically improved discovery, but validating and testing SNPs for use in population studies remain labour intensive. Here, we detail a SNP discovery and validation pipeline that incorporates 454 pyrosequencing, high‐resolution melt analysis (HRMA) and 5′ nuclease genotyping. We generated 4.59 × 108 bp of redundant sequence from transcriptomes of two individual chum salmon, a highly valued species across the Pacific Rim. Nearly 26 000 putative SNPs were identified—some as heterozygotes and some as homozygous for different nucleotides in the two individuals. For validation, we selected 202 templates containing single putative SNPs and conducted HRMA on 10 individuals from each of 19 populations from across the species range. Finally, 5′ nuclease genotyping validated 37 SNPs that conformed to Hardy–Weinberg equilibrium expectations. Putative SNPs expressed as heterozygotes in an ascertainment individual had more than twice the validation rate of those homozygous for different alleles in the two fish, suggesting that many of the latter may have been paralogous sequence variants. Overall, this validation rate of 37/202 suggests that we have found more than 4500 templates containing SNPs for use in this population set. We anticipate using this pipeline to significantly expand the number of SNPs available for the studies of population structure and mixture analyses as well as for the studies of adaptive genetic variation in nonmodel organisms.

[1]  L. Seeb,et al.  Single nucleotide polymorphisms across a species’ range: implications for conservation studies of Pacific salmon , 2011, Molecular ecology resources.

[2]  S. Lien,et al.  Generic genetic differences between farmed and wild Atlantic salmon identified from a 7K SNP‐chip , 2011, Molecular ecology resources.

[3]  M. Everett,et al.  SNP DISCOVERY: NEXT GENERATION SEQUENCING Short reads and nonmodel species: exploring the complexities of next-generation sequence assembly and SNP discovery in the absence of a reference genome , 2011 .

[4]  J. Hemmer-Hansen,et al.  Identification of single nucleotide polymorphisms in candidate genes for growth and reproduction in a nonmodel organism; the Atlantic cod, Gadus morhua , 2011, Molecular ecology resources.

[5]  R. Dietz,et al.  A simple route to single‐nucleotide polymorphisms in a nonmodel species: identification and characterization of SNPs in the Artic ringed seal (Pusa hispida hispida) , 2011, Molecular ecology resources.

[6]  E. Farley,et al.  Summer–Fall Distribution of Stocks of Immature Sockeye Salmon in the Bering Sea as Revealed by Single-Nucleotide Polymorphisms , 2010 .

[7]  L. Seeb,et al.  High‐Resolution Melting Analysis for the Discovery of Novel Single‐Nucleotide Polymorphisms in Rainbow and Cutthroat Trout for Species Identification , 2010 .

[8]  M. A. Nelson,et al.  The Next Generation Becomes the Now Generation , 2010, PLoS genetics.

[9]  Sébastien Renaut,et al.  Mining transcriptome sequences towards identifying adaptive single nucleotide polymorphisms in lake whitefish species pairs (Coregonus spp. Salmonidae) , 2010, Molecular ecology.

[10]  Nicholas Stiffler,et al.  Population Genomics of Parallel Adaptation in Threespine Stickleback using Sequenced RAD Tags , 2010, PLoS genetics.

[11]  B. Smith,et al.  High‐resolution melting analysis (HRMA): a highly sensitive inexpensive genotyping alternative for population studies , 2010, Molecular ecology resources.

[12]  J. Yao,et al.  Single nucleotide polymorphism discovery in rainbow trout by deep sequencing of a reduced representation library , 2009, BMC Genomics.

[13]  Paul Moran,et al.  A centralized model for creating shared, standardized, microsatellite data that simplifies inter-laboratory collaboration , 2009, Conservation Genetics.

[14]  A. Storfer,et al.  Modern Molecular Methods for Amphibian Conservation , 2009 .

[15]  J. Slate,et al.  Gene mapping in the wild with SNPs: guidelines and future directions , 2009, Genetica.

[16]  J. Jackson,et al.  Next-generation pyrosequencing of gonad transcriptomes in the polyploid lake sturgeon (Acipenser fulvescens): the relative merits of normalization and rarefaction in gene discovery , 2009, BMC Genomics.

[17]  C. Primmer From Conservation Genetics to Conservation Genomics , 2009, Annals of the New York Academy of Sciences.

[18]  Timothy B. Stockwell,et al.  Evaluation of next generation sequencing platforms for population targeted sequencing studies , 2009, Genome Biology.

[19]  R. Muñoz,et al.  Bringing Molecular Tools into Environmental Resource Management: Untangling the Molecules to Policy Pathway , 2009, PLoS biology.

[20]  B. Taylor,et al.  Assessing statistical power of SNPs for population structure and conservation studies , 2009, Molecular ecology resources.

[21]  L. Hauser,et al.  Advances in molecular technology and their impact on fisheries genetics , 2008 .

[22]  A. Punt,et al.  Integrating genetic data into management of marine resources: how can we do it better? , 2008 .

[23]  Richard A. Moore,et al.  A salmonid EST genomic study: genes, duplications, phylogeny and microarrays , 2008, BMC Genomics.

[24]  P. Hunt,et al.  High resolution melting analysis of almond SNPs derived from ESTs , 2008, Theoretical and Applied Genetics.

[25]  A. Elz,et al.  Differentiating salmon populations at broad and fine geographical scales with microsatellites and single nucleotide polymorphisms , 2008, Molecular ecology.

[26]  C. Nusbaum,et al.  Quality scores and SNP detection in sequencing-by-synthesis systems. , 2008, Genome research.

[27]  L. Seeb,et al.  Number of Alleles as a Predictor of the Relative Assignment Accuracy of Short Tandem Repeat (STR) and Single‐Nucleotide‐Polymorphism (SNP) Baselines for Chum Salmon , 2008 .

[28]  J. Marden,et al.  Rapid transcriptome characterization for a nonmodel organism using 454 pyrosequencing , 2008, Molecular ecology.

[29]  H. Ellegren Sequencing goes 454 and takes large‐scale genomics into the wild , 2008, Molecular ecology.

[30]  F. Rousset genepop’007: a complete re‐implementation of the genepop software for Windows and Linux , 2008, Molecular ecology resources.

[31]  Matthew E Hudson,et al.  Sequencing breakthroughs for genomic ecology and evolutionary biology , 2008, Molecular ecology resources.

[32]  L. Seeb,et al.  Development of a Standardized DNA Database for Chinook Salmon , 2007 .

[33]  L. Seeb,et al.  Thirty-eight single nucleotide polymorphism markers for high-throughput genotyping of chum salmon , 2007 .

[34]  Fumiko Ohta,et al.  The medaka draft genome and insights into vertebrate genome evolution , 2007, Nature.

[35]  L. Seeb,et al.  Use of sequence data from rainbow trout and Atlantic salmon for SNP detection in Pacific salmon , 2005, Molecular ecology.

[36]  C. Primmer,et al.  Challenges for identifying functionally important genetic variation: the promise of combining complementary research strategies , 2005, Molecular ecology.

[37]  L. Seeb,et al.  Characterization of 13 single nucleotide polymorphism markers for chum salmon , 2005 .

[38]  Timothy B Sackton,et al.  A Scan for Positively Selected Genes in the Genomes of Humans and Chimpanzees , 2005, PLoS biology.

[39]  N. Aitken,et al.  Single nucleotide polymorphism (SNP) discovery in mammals: a targeted‐gene approach , 2004, Molecular ecology.

[40]  L. Seeb,et al.  Migration of Pacific Rim Chum Salmon on the High Seas: Insights from Genetic Data , 2004, Environmental Biology of Fishes.

[41]  S. Lukyanov,et al.  Simple cDNA normalization using kamchatka crab duplex-specific nuclease. , 2004, Nucleic acids research.

[42]  Eric S. Lander,et al.  An SNP map of the human genome generated by reduced representation shotgun sequencing , 2000, Nature.

[43]  P. Green,et al.  Base-calling of automated sequencer traces using phred. I. Accuracy assessment. , 1998, Genome research.

[44]  P Green,et al.  Base-calling of automated sequencer traces using phred. II. Error probabilities. , 1998, Genome research.

[45]  F. Allendorf,et al.  Secondary tetrasomic segregation of MDH-B and preferential pairing of homeologues in rainbow trout. , 1997, Genetics.

[46]  R. Abramson,et al.  Detection of specific polymerase chain reaction product by utilizing the 5'----3' exonuclease activity of Thermus aquaticus DNA polymerase. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[47]  R. Waples Estimation of allele frequencies at isoloci. , 1988, Genetics.

[48]  B. Weir,et al.  ESTIMATING F‐STATISTICS FOR THE ANALYSIS OF POPULATION STRUCTURE , 1984, Evolution; international journal of organic evolution.

[49]  L. Seeb,et al.  SNP genotyping by the 5'-nuclease reaction: advances in high-throughput genotyping with nonmodel organisms. , 2009, Methods in molecular biology.

[50]  R. Bressan,et al.  RNA extraction. , 2006, Methods in molecular biology.

[51]  K. Miller,et al.  Sequence analysis of a polymorphic Mhc class II gene in Pacific salmon , 2005, Immunogenetics.

[52]  K. Livak SNP genotyping by the 5'-nuclease reaction. , 2003, Methods in molecular biology.

[53]  S Rozen,et al.  Primer3 on the WWW for general users and for biologist programmers. , 2000, Methods in molecular biology.

[54]  F. Allendorf,et al.  Tetraploidy and the Evolution of Salmonid Fishes , 1984 .