Modeling Insertional Mutagenesis Using Gene Length and Expression in Murine Embryonic Stem Cells

Background High-throughput mutagenesis of the mammalian genome is a powerful means to facilitate analysis of gene function. Gene trapping in embryonic stem cells (ESCs) is the most widely used form of insertional mutagenesis in mammals. However, the rules governing its efficiency are not fully understood, and the effects of vector design on the likelihood of gene-trapping events have not been tested on a genome-wide scale. Methodology/Principal Findings In this study, we used public gene-trap data to model gene-trap likelihood. Using the association of gene length and gene expression with gene-trap likelihood, we constructed spline-based regression models that characterize which genes are susceptible and which genes are resistant to gene-trapping techniques. We report results for three classes of gene-trap vectors, showing that both length and expression are significant determinants of trap likelihood for all vectors. Using our models, we also quantitatively identified hotspots of gene-trap activity, which represent loci where the high likelihood of vector insertion is controlled by factors other than length and expression. These formalized statistical models describe a high proportion of the variance in the likelihood of a gene being trapped by expression-dependent vectors and a lower, but still significant, proportion of the variance for vectors that are predicted to be independent of endogenous gene expression. Conclusions/Significance The findings of significant expression and length effects reported here further the understanding of the determinants of vector insertion. Results from this analysis can be applied to help identify other important determinants of this important biological phenomenon and could assist planning of large-scale mutagenesis efforts.

[1]  R. Arora,et al.  Integration of Human Immunodeficiency Virus Type 1 in Untreated Infection Occurs Preferentially within Genes , 2006, Journal of Virology.

[2]  Rafael A. Irizarry,et al.  Comparison of Affymetrix GeneChip expression measures , 2006, Bioinform..

[3]  H. Mori,et al.  Construction of Escherichia coli K-12 in-frame, single-gene knockout mutants: the Keio collection , 2006, Molecular systems biology.

[4]  W. Wurst,et al.  High-throughput trapping of secretory pathway genes in mouse embryonic stem cells , 2006, Nucleic acids research.

[5]  Andreas Prlic,et al.  Ensembl 2006 , 2005, Nucleic Acids Res..

[6]  Songyan Liu,et al.  The International Gene Trap Consortium Website: a portal to all publicly available gene trap cell lines in mouse , 2005, Nucleic Acids Res..

[7]  Sridhar Hannenhalli,et al.  Genome-wide analysis of retroviral DNA integration , 2005, Nature Reviews Microbiology.

[8]  S. Mcconnell,et al.  Gene targeting using a promoterless gene trap vector ("targeted trapping") is an efficient method to mutate a large fraction of genes. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Claudia Seisenberger,et al.  Genomewide production of multipurpose alleles for the functional analysis of the mouse genome. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[10]  W. Gelbart,et al.  Research resources for Drosophila: the expanding universe , 2005, Nature Reviews Genetics.

[11]  M. Kawaichi,et al.  Suppression of nonsense-mediated mRNA decay permits unbiased gene trapping in mouse embryonic stem cells , 2005, Nucleic acids research.

[12]  Madeline A. Crosby,et al.  FlyBase: genes and gene models , 2004, Nucleic Acids Res..

[13]  R. Stoyanova,et al.  Genome-Wide Analyses of Avian Sarcoma Virus Integration Sites , 2004, Journal of Virology.

[14]  Christopher P Austin,et al.  The Knockout Mouse Project , 2004, Nature Genetics.

[15]  Wolfgang Wurst,et al.  A public gene trap resource for mouse functional genomics , 2004, Nature Genetics.

[16]  R. Daniel,et al.  Integrase-Specific Enhancement and Suppression of Retroviral DNA Integration by Compacted Chromatin Structure In Vitro , 2004, Journal of Virology.

[17]  Anna B. Osipovich,et al.  Activation of cryptic 3' splice sites within introns of cellular genes following gene entrapment. , 2004, Nucleic acids research.

[18]  G. Rubin,et al.  The BDGP gene disruption project: single transposon insertions associated with 40% of Drosophila genes , 2004 .

[19]  Mei Yu,et al.  The Centre for Modeling Human Disease Gene Trap resource , 2004, Nucleic Acids Res..

[20]  Takeshi Suzuki,et al.  RTCGD: retroviral tagged cancer gene database , 2004, Nucleic Acids Res..

[21]  Peter Vogel,et al.  Wnk1 kinase deficiency lowers blood pressure in mice: A gene-trap screen to identify potential targets for therapeutic intervention , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[22]  F. Bushman Targeting Survival Integration Site Selection by Retroviruses and LTR-Retrotransposons , 2003, Cell.

[23]  Franz Vauti,et al.  A large-scale, gene-driven mutagenesis approach for the functional analysis of the mouse genome , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Shawn M. Burgess,et al.  Transcription Start Regions in the Human Genome Are Favored Targets for MLV Integration , 2003, Science.

[25]  Daniel G. Miller,et al.  Human Gene Targeting by Adeno-Associated Virus Vectors Is Enhanced by DNA Double-Strand Breaks , 2003, Molecular and Cellular Biology.

[26]  Conrad C. Huang,et al.  BayGenomics: a resource of insertional mutations in mouse embryonic stem cells , 2003, Nucleic Acids Res..

[27]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[28]  W. Stanford,et al.  Gene-trap mutagenesis: past, present and beyond , 2001, Nature Reviews Genetics.

[29]  Paul Scherz,et al.  Functional analysis of secreted and transmembrane proteins critical to mouse development , 2001, Nature Genetics.

[30]  S. Chow,et al.  Molecular genetics and target site specificity of retroviral integration. , 2001, Advances in genetics.

[31]  B. Wanner,et al.  One-step inactivation of chromosomal genes in Escherichia coli K-12 using PCR products. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[32]  M. Jasin,et al.  Analysis of Gene Targeting and Intrachromosomal Homologous Recombination Stimulated by Genomic Double-Strand Breaks in Mouse Embryonic Stem Cells , 1998, Molecular and Cellular Biology.

[33]  V. Gorbunova,et al.  Non-homologous DNA end joining in plant cells is associated with deletions and filler DNA insertions. , 1997, Nucleic acids research.

[34]  H. Ruley,et al.  Functional genomics in mice by tagged sequence mutagenesis , 1997, Nature Genetics.

[35]  H. Varmus,et al.  DNA bending creates favored sites for retroviral integration: an explanation for preferred insertion sites in nucleosomes. , 1994, The EMBO journal.

[36]  W. C. Forrester,et al.  A gene trap approach in mouse embryonic stem cells: the lacZ reported is activated by splicing, reflects endogenous gene expression, and is mutagenic in mice. , 1992, Genes & development.

[37]  P. McCullagh,et al.  Generalized Linear Models, 2nd Edn. , 1990 .

[38]  M. Breindl,et al.  Transcriptionally active genome regions are preferred targets for retrovirus integration , 1990, Journal of virology.

[39]  Melchner,et al.  Identification of cellular promoters by using a retrovirus promoter trap , 1989, Journal of virology.

[40]  C. R. Deboor,et al.  A practical guide to splines , 1978 .

[41]  Carl de Boor,et al.  A Practical Guide to Splines , 1978, Applied Mathematical Sciences.

[42]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .