affy2sv: an R package to pre-process Affymetrix CytoScan HD and 750K arrays for SNP, CNV, inversion and mosaicism calling

BackgroundThe well-known Genome-Wide Association Studies (GWAS) had led to many scientific discoveries using SNP data. Even so, they were not able to explain the full heritability of complex diseases. Now, other structural variants like copy number variants or DNA inversions, either germ-line or in mosaicism events, are being studies. We present the R package affy2sv to pre-process Affymetrix CytoScan HD/750k array (also for Genome-Wide SNP 5.0/6.0 and Axiom) in structural variant studies.ResultsWe illustrate the capabilities of affy2sv using two different complete pipelines on real data. The first one performing a GWAS and a mosaic alterations detection study, and the other detecting CNVs and performing an inversion calling.ConclusionBoth examples presented in the article show up how affy2sv can be used as part of more complex pipelines aimed to analyze Affymetrix SNP arrays data in genetic association studies, where different types of structural variants are considered.

[1]  Thomas A. Louis,et al.  Quantifying uncertainty in genotype calls , 2010, Bioinform..

[2]  Stephen J Chanock,et al.  Detectable clonal mosaicism in the human genome. , 2013, Seminars in hematology.

[3]  X. Estivill,et al.  Joint effect of obesity and TNFA variability on asthma: two international cohort studies , 2009, European Respiratory Journal.

[4]  Mahlet G. Tadesse,et al.  Modeling genetic inheritance of copy number variations , 2008, Nucleic acids research.

[5]  Steven A Frank,et al.  Somatic Mosaicism and Disease , 2014, Current Biology.

[6]  Rafael A. Irizarry,et al.  R/Bioconductor software for Illumina's Infinium whole-genome genotyping BeadChips , 2009, Bioinform..

[7]  William Wheeler,et al.  Detectable clonal mosaicism and its relationship to aging and cancer , 2012, Nature Genetics.

[8]  Juan R. González,et al.  R-Gada: a fast and flexible pipeline for copy number analysis in association studies , 2010, BMC Bioinformatics.

[9]  Peter Kraft,et al.  Post-GWAS gene-environment interplay in breast cancer: results from the Breast and Prostate Cancer Cohort Consortium and a meta-analysis on 79,000 women. , 2014, Human molecular genetics.

[10]  Wei Li,et al.  A Polymorphism rs12325489C>T in the LincRNA-ENST00000515084 Exon Was Found to Modulate Breast Cancer Risk via GWAS-Based Association Analyses , 2014, PloS one.

[11]  Benjamin J. Raphael,et al.  Identification of polymorphic inversions from genotypes , 2011, BMC Bioinformatics.

[12]  Juan R. González,et al.  Following the footprints of polymorphic inversions on SNP data: from detection to association tests , 2015, Nucleic acids research.

[13]  Juan R. González,et al.  A fast and accurate method to detect allelic genomic imbalances underlying mosaic rearrangements using SNP array data , 2011, BMC Bioinformatics.

[14]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[15]  Akira Fujimori,et al.  Role of LET and chromatin structure on chromosomal inversion in CHO10B2 cells , 2014, Genome Integrity.

[16]  Jonathan Schug,et al.  Genome-wide analyses of ChIP-Seq derived FOXA2 DNA occupancy in liver points to genetic networks underpinning multiple complex traits. , 2014, The Journal of clinical endocrinology and metabolism.

[17]  Annet Simons,et al.  Identification of prognostic relevant chromosomal abnormalities in chronic lymphocytic leukemia using microarray-based genomic profiling , 2014, Molecular Cytogenetics.

[18]  Eric J Duncavage,et al.  Copy number variants in clinical next-generation sequencing data can define the relationship between simultaneous tumors in an individual patient. , 2014, Experimental and molecular pathology.

[19]  R. Scharpf,et al.  A multilevel model to address batch effects in copy number estimation using SNP arrays. , 2011, Biostatistics.

[20]  Sharon J. Diskin,et al.  Adjustment of genomic waves in signal intensities from whole-genome SNP genotyping platforms , 2008, Nucleic acids research.

[21]  Matthew E Ritchie,et al.  Using the R Package crlmm for Genotyping and Copy Number Estimation. , 2011, Journal of statistical software.

[22]  Joseph T. Glessner,et al.  PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. , 2007, Genome research.

[23]  Weiliang Qiu,et al.  Genome-wide interaction studies reveal sex-specific asthma risk alleles. , 2014, Human molecular genetics.

[24]  Andres Metspalu,et al.  A common 16p11.2 inversion underlies the joint susceptibility to asthma and obesity. , 2014, American journal of human genetics.

[25]  Nora J. Besansky,et al.  Adaptation to Aridity in the Malaria Mosquito Anopheles gambiae: Chromosomal Inversion Polymorphism and Body Size Influence Resistance to Desiccation , 2012, PloS one.

[26]  S. Harrison,et al.  DNA copy number variations in patients with 46,XY disorders of sex development. , 2014, The Journal of urology.

[27]  Yoon-La Choi,et al.  Genomic copy number alterations associated with the early brain metastasis of non-small cell lung cancer. , 2012, International journal of oncology.

[28]  Anders Valind,et al.  Confined trisomy 8 mosaicism of meiotic origin: A rare cause of aneuploidy in childhood cancer , 2014, Genes, chromosomes & cancer.

[29]  D. Strachan,et al.  Genome‐wide association study of body mass index in 23 000 individuals with and without asthma , 2013, Clinical and experimental allergy : journal of the British Society for Allergy and Clinical Immunology.

[30]  J. Sunyer,et al.  Genetic risk profiles for a childhood with severely overweight , 2014, Pediatric obesity.

[31]  Alkes L. Price,et al.  Quantifying Missing Heritability at Known GWAS Loci , 2013, PLoS genetics.

[32]  Pingzhao Hu,et al.  A high-resolution copy-number variation resource for clinical and population genetics , 2014, Genetics in Medicine.

[33]  Eric E Schadt,et al.  The origin, global distribution, and functional impact of the human 8p23 inversion polymorphism , 2012, Genome research.

[34]  W. Wasserman,et al.  On the identification of potential regulatory variants within genome wide association candidate SNP sets , 2014, BMC Medical Genomics.

[35]  B. Maher Personal genomes: The case of the missing heritability , 2008, Nature.