Genome-wide DNA polymorphism analyses using VariScan

BackgroundDNA sequence polymorphisms analysis can provide valuable information on the evolutionary forces shaping nucleotide variation, and provides an insight into the functional significance of genomic regions. The recent ongoing genome projects will radically improve our capabilities to detect specific genomic regions shaped by natural selection. Current available methods and software, however, are unsatisfactory for such genome-wide analysis.ResultsWe have developed methods for the analysis of DNA sequence polymorphisms at the genome-wide scale. These methods, which have been tested on a coalescent-simulated and actual data files from mouse and human, have been implemented in the VariScan software package version 2.0. Additionally, we have also incorporated a graphical-user interface. The main features of this software are: i) exhaustive population-genetic analyses including those based on the coalescent theory; ii) analysis adapted to the shallow data generated by the high-throughput genome projects; iii) use of genome annotations to conduct a comprehensive analyses separately for different functional regions; iv) identification of relevant genomic regions by the sliding-window and wavelet-multiresolution approaches; v) visualization of the results integrated with current genome annotations in commonly available genome browsers.ConclusionVariScan is a powerful and flexible suite of software for the analysis of DNA polymorphisms. The current version implements new algorithms, methods, and capabilities, providing an important tool for an exhaustive exploratory analysis of genome-wide DNA polymorphism data.

[1]  Xavier Messeguer,et al.  DnaSP, DNA polymorphism analyses by the coalescent and other methods , 2003, Bioinform..

[2]  M. Aguadé,et al.  Detecting the footprint of positive selection in a european population of Drosophila melanogaster: multilocus pattern of variation and distance to coding regions. , 2004, Genetics.

[3]  M. Kimura The Neutral Theory of Molecular Evolution: Introduction , 1983 .

[4]  Kevin R. Thornton,et al.  Multilocus patterns of nucleotide variability and the demographic and selection history of Drosophila melanogaster populations. , 2005, Genome research.

[5]  F. Depaulis,et al.  Neutrality tests based on the distribution of haplotypes under an infinite-site model. , 1998, Molecular biology and evolution.

[6]  W. G. Hill,et al.  Linkage disequilibrium in finite populations , 1968, Theoretical and Applied Genetics.

[7]  J. Kingman On the genealogy of large populations , 1982, Journal of Applied Probability.

[8]  F. Tajima Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. , 1989, Genetics.

[9]  Laurent Excoffier,et al.  Arlequin (version 3.0): An integrated software package for population genetics data analysis , 2005, Evolutionary bioinformatics online.

[10]  Pietro Liò,et al.  Finding pathogenicity islands and gene transfer events in genome data , 2000, Bioinform..

[11]  Pardis C Sabeti,et al.  Detecting recent positive selection in the human genome from haplotype structure , 2002, Nature.

[12]  W. Li,et al.  Statistical tests of neutrality of mutations. , 1993, Genetics.

[13]  Richard R. Hudson,et al.  Generating samples under a Wright-Fisher neutral model of genetic variation , 2002, Bioinform..

[14]  Pietro Liò,et al.  Wavelets in bioinformatics and computational biology: state of art and perspectives , 2003, Bioinform..

[15]  Ingrid Daubechies,et al.  Ten Lectures on Wavelets , 1992 .

[16]  T. Jukes,et al.  The neutral theory of molecular evolution. , 2000, Genetics.

[17]  Mattias Jakobsson,et al.  The Pattern of Polymorphism in Arabidopsis thaliana , 2005, PLoS biology.

[18]  R. Hudson,et al.  A test of neutral molecular evolution based on nucleotide data. , 1987, Genetics.

[19]  Stefan Schneider,et al.  Arlequin (version 3.0): An integrated software package for population genetics data analysis , 2005 .

[20]  Stéphane Mallat,et al.  A Theory for Multiresolution Signal Decomposition: The Wavelet Representation , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  N L Kaplan,et al.  The "hitchhiking effect" revisited. , 1989, Genetics.

[22]  E. Bacry,et al.  Characterizing long-range correlations in DNA sequences from wavelet analysis. , 1995, Physical review letters.

[23]  M. Kreitman,et al.  Adaptive protein evolution at the Adh locus in Drosophila , 1991, Nature.

[24]  R. Lewontin The Interaction of Selection and Linkage. I. General Considerations; Heterotic Models. , 1964, Genetics.

[25]  M. Nordborg,et al.  The pattern of polymorphism on human chromosome 21. , 2002, Genome research.

[26]  Y. Fu,et al.  Statistical tests of neutrality of mutations against population growth, hitchhiking and background selection. , 1997, Genetics.

[27]  S. P. Fodor,et al.  Blocks of Limited Haplotype Diversity Revealed by High-Resolution Scanning of Human Chromosome 21 , 2001, Science.

[28]  Patrick D. Evans,et al.  Ongoing Adaptive Evolution of ASPM, a Brain Size Determinant in Homo sapiens , 2005, Science.

[29]  Albert J. Vilella,et al.  VariScan: Analysis of evolutionary patterns from large-scale DNA sequence polymorphism data , 2005, Bioinform..

[30]  S. Lewis,et al.  The generic genome browser: a building block for a model organism system database. , 2002, Genome research.

[31]  J K Kelly,et al.  A test of neutrality based on interlocus associations. , 1997, Genetics.

[32]  A. Hughes,et al.  Natural selection at major histocompatibility complex loci of vertebrates. , 1998, Annual review of genetics.

[33]  Santiago F. Elena,et al.  A Sliding Window-Based Method to Detect Selective Constraints in Protein-Coding Genes and Its Application to RNA Viruses , 2002, Journal of Molecular Evolution.

[34]  Tom H. Pringle,et al.  The human genome browser at UCSC. , 2002, Genome research.

[35]  P. Andolfatto Adaptive evolution of non-coding DNA in Drosophila , 2005, Nature.

[36]  A. Long,et al.  Identifying signatures of selection at the enhancer of split neurogenic gene complex in Drosophila. , 2005, Molecular biology and evolution.

[37]  Justin C. Fay,et al.  Hitchhiking under positive Darwinian selection. , 2000, Genetics.

[38]  Fumio Tajima,et al.  Determination of window size for analyzing DNA sequences , 1991, Journal of Molecular Evolution.

[39]  R. Hudson Gene genealogies and the coalescent process. , 1990 .

[40]  M. Aguadé,et al.  Large-scale adaptive hitchhiking upon high recombination in Drosophila simulans. , 2003, Genetics.

[41]  D. Filatov proseq: A software for preparation and evolutionary analysis of DNA sequence data sets , 2002 .

[42]  W. Stephan,et al.  Detecting a local signature of genetic hitchhiking along a recombining chromosome. , 2002, Genetics.

[43]  Noah A. Rosenberg,et al.  Genealogical trees, coalescent theory and the analysis of genetic polymorphisms , 2002, Nature Reviews Genetics.

[44]  M. Nei Molecular Evolutionary Genetics , 1987 .

[45]  Julio Rozas,et al.  DnaSP, DNA sequence polymorphism: an interactive program for estimating population genetics parameters from DNA sequence data , 1995, Comput. Appl. Biosci..

[46]  M. Kreitman,et al.  Nucleotide polymorphism at the alcohol dehydrogenase locus of Drosophila melanogaster , 1983, Nature.

[47]  J. Kingman On the genealogy of large populations , 1982 .