Comparing DNA integration site clusters with scan statistics

MOTIVATION Gene therapy with retroviral vectors can induce adverse effects when those vectors integrate in sensitive genomic regions. Retroviral vectors are preferred that target sensitive regions less frequently, motivating the search for localized clusters of integration sites and comparison of the clusters formed by integration of different vectors. Scan statistics allow the discovery of spatial differences in clustering and calculation of false discovery rates providing statistical methods for comparing retroviral vectors. RESULTS A scan statistic for comparing two vectors using multiple window widths is proposed with software to detect clustering differentials and compute false discovery rates. Application to several sets of experimentally determined HIV integration sites demonstrates the software. Simulated datasets of various sizes and signal strengths are used to determine the power to discover clusters and evaluate a convenient lower bound. This provides a toolkit for planning evaluations of new gene therapy vectors. AVAILABILITY AND IMPLEMENTATION The geneRxCluster R package containing a simple tutorial and usage hints is available from http://www.bioconductor.org.

[1]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[2]  D. Aldous Probability Approximations via the Poisson Clumping Heuristic , 1988 .

[3]  F. Bushman,et al.  HIV integration site selection: analysis by massively parallel pyrosequencing reveals association with epigenetic modifications. , 2007, Genome research.

[4]  Mary Goldman,et al.  The UCSC Genome Browser database: extensions and updates 2013 , 2012, Nucleic Acids Res..

[5]  Robert Craigie,et al.  HIV DNA integration. , 2012, Cold Spring Harbor perspectives in medicine.

[6]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[7]  Yu Zhang,et al.  Poisson approximation for significance in genome-wide ChIP-chip tiling arrays , 2008, Bioinform..

[8]  Shawn M. Burgess,et al.  Transcription Start Regions in the Human Genome Are Favored Targets for MLV Integration , 2003, Science.

[9]  C. von Kalle,et al.  Real-Time Definition of Non-Randomness in the Distribution of Genomic Events , 2007, PloS one.

[10]  R. Gentleman,et al.  Independent filtering increases detection power for high-throughput experiments , 2010, Proceedings of the National Academy of Sciences.

[11]  D. Siegmund,et al.  False discovery rate for scanning statistics , 2011 .

[12]  F. Bushman,et al.  DNA bar coding and pyrosequencing to analyze adverse events in therapeutic gene transfer , 2008, Nucleic acids research.

[13]  D. Moher,et al.  Correspondence2010 Statement: updated guidelines for reporting parallel group randomised trials , 2010 .

[14]  Nancy R. Zhang,et al.  Subsampling methods for genomic inference , 2010, 1101.0947.

[15]  Paul Shinn,et al.  HIV-1 Integration in the Human Genome Favors Active Genes and Local Hotspots , 2002, Cell.

[16]  Frederic D. Bushman,et al.  Efficacy of gene therapy for X-linked severe combined immunodeficiency. , 2010, The New England journal of medicine.

[17]  Luca Biasco,et al.  Comprehensive genomic access to vector integration in clinical gene therapy , 2009, Nature Medicine.

[18]  M. Schmid,et al.  Genome-Wide Insertional Mutagenesis of Arabidopsis thaliana , 2003, Science.

[19]  M. Wigler,et al.  Circular binary segmentation for the analysis of array-based DNA copy number data. , 2004, Biostatistics.

[20]  Christof von Kalle,et al.  A serious adverse event after successful gene therapy for X-linked severe combined immunodeficiency. , 2003, The New England journal of medicine.

[21]  Robert Gentleman,et al.  Software for Computing and Annotating Genomic Ranges , 2013, PLoS Comput. Biol..

[22]  D. Moher,et al.  CONSORT 2010 statement: Updated guidelines for reporting parallel group randomised trials , 2010, Journal of pharmacology & pharmacotherapeutics.

[23]  N. Stanietsky,et al.  The interaction of TIGIT with PVR and PVRL2 inhibits human NK cell cytotoxicity , 2009, Proceedings of the National Academy of Sciences.

[24]  David Moher,et al.  CONSORT 2010 Statement: Updated Guidelines for Reporting Parallel Group Randomised Trials , 2010, PLoS medicine.

[25]  F. Bushman,et al.  Retroviral DNA Integration: ASLV, HIV, and MLV Show Distinct Target Site Preferences , 2004, PLoS biology.

[26]  S F Altschul,et al.  Statistical methods and insights for protein and DNA sequences. , 1991, Annual review of biophysics and biophysical chemistry.

[27]  Bruce Aronow,et al.  Vector integration is nonrandom and clustered and influences the fate of lymphopoiesis in SCID-X1 gene therapy. , 2007, The Journal of clinical investigation.

[28]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.