Matapax: An Online High-Throughput Genome-Wide Association Study Pipeline[C][W][OA]

High-throughput sequencing and genotyping methods are dramatically increasing the number of observable genetic intraspecies differences that can be exploited as genetic markers. In addition, automated phenotyping platforms and “omics” profiling technologies further enlarge the set of quantifiable macroscopic and molecular traits at an ever-increasing pace. Combined, both lines of technological advances create unparalleled opportunities to identify candidate gene regions and, ideally, even single genes responsible for observed variations in a particular trait via association studies. However, as of yet, this new potential is not sufficiently matched by enabling software solutions to easily exploit this wealth of genotype/phenotype information. We have developed Matapax, a Web-based platform to address this need. Initially, we built the infrastructure to support association studies in Arabidopsis (Arabidopsis thaliana) based on several genotyping efforts covering up to 1,375 Arabidopsis accessions. Based on the user-supplied trait information, associated single-nucleotide polymorphism markers and single-nucleotide polymorphism-harboring or -neighboring genes are identified using both the GAPIT and EMMA libraries developed for R. Additional interrogation is facilitated by displaying candidate regions and genes in a genome browser and by providing relevant annotation information. In the future, we plan to broaden the scope of organisms to other plant species as more genotype/phenotype information becomes available. Matapax is freely available at http://matapax.mpimp-golm.mpg.de and can be accessed using any internet browser.

[1]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[2]  Toshihiro Tanaka The International HapMap Project , 2003, Nature.

[3]  Richard M. Clark,et al.  Sequencing of natural strains of Arabidopsis thaliana with short reads. , 2008, Genome research.

[4]  Detlef Weigel,et al.  Recombination and linkage disequilibrium in Arabidopsis thaliana , 2007, Nature Genetics.

[5]  Bjarni J. Vilhjálmsson,et al.  Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines , 2010 .

[6]  Keyan Zhao,et al.  An Arabidopsis Example of Association Mapping in Structured Samples , 2006, PLoS genetics.

[7]  T. Hudson,et al.  A genome-wide association study identifies novel risk loci for type 2 diabetes , 2007, Nature.

[8]  Ying Wang,et al.  Genomewide association study of leprosy. , 2009, The New England journal of medicine.

[9]  P. Benfey,et al.  High-throughput phenotyping of multicellular organisms: finding the link between genotype and phenotype , 2011, Genome Biology.

[10]  B. Browning,et al.  Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. , 2007, American journal of human genetics.

[11]  Karsten M. Borgwardt,et al.  Whole-genome sequencing of multiple Arabidopsis thaliana populations , 2011, Nature Genetics.

[12]  A. Auton,et al.  Genome-wide patterns of genetic variation in worldwide Arabidopsis thaliana accessions from the RegMap panel , 2011, Nature Genetics.

[13]  Meng Li,et al.  Genetics and population analysis Advance Access publication July 13, 2012 , 2012 .

[14]  Keyan Zhao,et al.  Genome-Wide Association Mapping in Arabidopsis Identifies Previously Known Flowering Time and Pathogen Resistance Genes , 2005, PLoS genetics.

[15]  A. Fernie,et al.  Gas chromatography mass spectrometry–based metabolite profiling in plants , 2006, Nature Protocols.

[16]  J. Dangl,et al.  Structure of the Arabidopsis RPM1 gene enabling dual specificity disease resistance , 1995, Science.

[17]  Yurii S. Aulchenko,et al.  BIOINFORMATICS APPLICATIONS NOTE doi:10.1093/bioinformatics/btm108 Genetics and population analysis GenABEL: an R library for genome-wide association analysis , 2022 .

[18]  J. Ott,et al.  Complement Factor H Polymorphism in Age-Related Macular Degeneration , 2005, Science.

[19]  Edward S. Buckler,et al.  TASSEL: software for association mapping of complex traits in diverse samples , 2007, Bioinform..

[20]  S. Shapiro,et al.  An Analysis of Variance Test for Normality (Complete Samples) , 1965 .

[21]  Detlef Weigel,et al.  Natural allelic variation underlying a major fitness tradeoff in Arabidopsis thaliana , 2010, Nature.

[22]  Arcadi Navarro,et al.  Genome-wide association studies pipeline (GWASpi): a desktop application for genome-wide SNP analysis and management , 2011, Bioinform..

[23]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[24]  Bjarni J. Vilhjálmsson,et al.  Analysis and visualization of Arabidopsis thaliana GWAS using web 2.0 technologies , 2011, Database J. Biol. Databases Curation.

[25]  D. Heckerman,et al.  Efficient Control of Population Structure in Model Organism Association Mapping , 2008, Genetics.

[26]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[27]  R. Mott,et al.  The 1001 Genomes Project for Arabidopsis thaliana , 2009, Genome Biology.

[28]  J. Ott,et al.  Complement Factor H Polymorphism in Age-Related Macular Degeneration , 2005, Science.