Bioinformatics of High-Throughput Insertional Mutagenesis

Bioinformatics plays critical roles to handle large amount of sequence data from insertional mutagenesis. First, computational approaches are used to develop rapid sequence analysis pipelines and biological databases. Millions of reads from an insertion mutagenesis screening are mapped to genomic locations and be annotated to their target genes rapidly by pipeline, and such sequence-based data is stored and managed in database to share the information in the scientific community. Second, statistical techniques are used to distinguish true common insertion sites (loci that have been hit by insertions in multiple tumors: candidate loci for cancer genes) from background insertions in large-scale screenings. Finally, the advanced data mining techniques, pathway and network analysis, are used to give further biological meaning to insertion sites by identifying the interaction of genes in cancer. In this chapter, we discuss features of these three topics and address their future roles: (1) development of sequence analysis pipeline and database, (2) detection of common insertion sites, and (3) network and pathway analysis of insertion sites.

[1]  Corey M. Carlson,et al.  Cancer gene discovery in solid tumours using transposon-based somatic mutagenesis in the mouse , 2005, Nature.

[2]  B. Fehse,et al.  Insertional mutagenesis and clonal dominance: biological and statistical considerations , 2008, Gene Therapy.

[3]  J. Dudley Tag, you're hit: retroviral insertions identify genes involved in cancer. , 2003, Trends in molecular medicine.

[4]  Steven C. Lawlor,et al.  GenMAPP, a new tool for viewing and analyzing microarray data on biological pathways , 2002, Nature Genetics.

[5]  Ying-Wei Lin,et al.  Retroviral insertional mutagenesis identifies genes that collaborate with NUP98-HOXD13 during leukemic transformation. , 2007, Cancer research.

[6]  J. Downing,et al.  Murine Leukemias with Retroviral Insertions at Lmo2 Are Predictive of the Leukemias Induced in SCID-X1 Patients Following Retroviral Gene Therapy , 2009, PLoS genetics.

[7]  Jelle J. Goeman,et al.  A global test for groups of genes: testing association with a clinical outcome , 2004, Bioinform..

[8]  E. Miska,et al.  miRNAs in cancer: approaches, aetiology, diagnostics and therapy. , 2007, Human molecular genetics.

[9]  Derek Y. Chiang,et al.  A conditional transposon-based insertional mutagenesis screen for genes associated with mouse hepatocellular carcinoma , 2009, Nature Biotechnology.

[10]  Ron Shamir,et al.  Identification of functional modules using network topology and high-throughput data , 2007, BMC Systems Biology.

[11]  Antoine H. C. van Kampen,et al.  Visualizing metabolic activity on a genome-wide scale , 2002, Bioinform..

[12]  Anton Berns,et al.  High-throughput retroviral tagging to identify components of specific signaling pathways in cancer , 2002, Nature Genetics.

[13]  Judith A. Blake,et al.  The Mouse Genome Database (MGD): mouse biology and model systems , 2007, Nucleic Acids Res..

[14]  Tatiana Tatusova,et al.  NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..

[15]  Y. Ben-David,et al.  Retroviral insertional activation of the Fli-3 locus in erythroleukemias encoding a cluster of microRNAs that convert Epo-induced differentiation to proliferation. , 2007, Blood.

[16]  Roland Eils,et al.  Group testing for pathway analysis improves comparability of different microarray datasets , 2006, Bioinform..

[17]  T. Barrette,et al.  ONCOMINE: a cancer microarray database and integrated data-mining platform. , 2004, Neoplasia.

[18]  Seon-Young Kim,et al.  PAGE: Parametric Analysis of Gene Set Enrichment , 2005, BMC Bioinform..

[19]  Shawn M. Burgess,et al.  High-Resolution Genome-Wide Mapping of Transposon Integration in Mammals , 2005, Molecular and Cellular Biology.

[20]  Thomas Werner,et al.  MatInspector and beyond: promoter analysis based on transcription factor binding sites , 2005, Bioinform..

[21]  Shawn M. Burgess,et al.  Transcription Start Regions in the Human Genome Are Favored Targets for MLV Integration , 2003, Science.

[22]  Daniel F Voytas,et al.  Common physical properties of DNA affecting target site selection of sleeping beauty and other Tc1/mariner transposable elements. , 2002, Journal of molecular biology.

[23]  M. Ronaghi,et al.  A pyrosequencing-tailored nucleotide barcode design unveils opportunities for large-scale sample multiplexing , 2007, Nucleic acids research.

[24]  Marcel J. T. Reinders,et al.  Detecting Statistically Significant Common Insertion Sites in Retroviral Insertional Mutagenesis Screens , 2006, PLoS Comput. Biol..

[25]  Heng Li,et al.  Deep short-read sequencing of chromosome 17 from the mouse strains A/J and CAST/Ei identifies significant germline variation and candidate genes that regulate liver triglyceride levels , 2009, Genome Biology.

[26]  Matthew Suderman,et al.  Tools for visually exploring biological networks , 2007, Bioinform..

[27]  G. Casella,et al.  Statistical Inference , 2003, Encyclopedia of Social Network Analysis and Mining.

[28]  Sergei Egorov,et al.  Pathway studio - the analysis and navigation of molecular networks , 2003, Bioinform..

[29]  Gabriele Ausiello,et al.  MINT: the Molecular INTeraction database , 2006, Nucleic Acids Res..

[30]  V. Arango,et al.  Using the Gene Ontology for Microarray Data Mining: A Comparison of Methods and Application to Age Effects in Human Prefrontal Cortex , 2004, Neurochemical Research.

[31]  Alan F. Scott,et al.  Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders , 2004, Nucleic Acids Res..

[32]  Duccio Cavalieri,et al.  Pathway Processor: a tool for integrating whole-genome expression results into metabolic networks. , 2002, Genome research.

[33]  M. Tyers,et al.  Osprey: a network visualization system , 2003, Genome Biology.

[34]  Li Wang,et al.  PBmice: an integrated database system of piggyBac (PB) insertional mutations and their characterizations in mice , 2007, Nucleic Acids Res..

[35]  J. Mesirov,et al.  An oncogenic KRAS2 expression signature identified by cross-species gene-expression analysis , 2005, Nature Genetics.

[36]  Sam Griffiths-Jones,et al.  miRBase: the microRNA sequence database. , 2006, Methods in molecular biology.

[37]  Douglas A. Hosack,et al.  Identifying biological themes within lists of genes with EASE , 2003, Genome Biology.

[38]  Clive Brown,et al.  Toward the $1000 human genome , 2005 .

[39]  Joaquín Dopazo,et al.  FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes , 2004, Bioinform..

[40]  Thomas Werner,et al.  The next generation of literature analysis: Integration of genomic analysis into text mining , 2005, Briefings Bioinform..

[41]  A. Berns,et al.  Retroviral insertional mutagenesis: past, present and future , 2005, Oncogene.

[42]  Brian T Luke,et al.  Redefining the common insertion site. , 2006, Virology.

[43]  Jos Jonkers,et al.  MMTV insertional mutagenesis identifies genes, gene families and pathways involved in mammary cancer , 2007, Nature Genetics.

[44]  Stéphane Bressan,et al.  Introduction to Database Systems , 2005 .

[45]  Colin N. Dewey,et al.  Initial sequencing and comparative analysis of the mouse genome. , 2002 .

[46]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[47]  Peter D. Karp,et al.  The Pathway Tools software , 2002, ISMB.

[48]  Ming Yi,et al.  Seeking unique and common biological themes in multiple gene lists or datasets: pathway pattern extraction pipeline for pathway-level comparative analysis , 2009, BMC Bioinformatics.

[49]  Daniel J. Vis,et al.  T-profiler: scoring the activity of predefined groups of genes using gene expression data , 2005, Nucleic Acids Res..

[50]  T. Hubbard,et al.  A census of human cancer genes , 2004, Nature Reviews Cancer.

[51]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[52]  W. S. Hayward,et al.  Activation of a cellular onc gene by promoter insertion in ALV-induced lymphoid leukosis , 1981, Nature.

[53]  Takeshi Suzuki,et al.  RTCGD: retroviral tagged cancer gene database , 2004, Nucleic Acids Res..

[54]  P. Tsichlis,et al.  A common region for proviral DNA integration in MoMuLV-induced rat thymic lymphomas , 1983, Nature.

[55]  Amarnath Gupta,et al.  PathSys: integrating molecular interaction graphs for systems biology , 2006, BMC Bioinformatics.

[56]  Scott E. Martin,et al.  MicroRNAs and genomic instability. , 2007, Seminars in Cancer Biology.

[57]  Danielle Hulsman,et al.  Genome-wide retroviral insertional tagging of genes involved in cancer in Cdkn2a-deficient mice , 2002, Nature Genetics.

[58]  Thomas L. Casavant,et al.  Pooled library tissue tags for EST-based gene discovery , 2002, Bioinform..

[59]  Ming Yi,et al.  WholePathwayScope: a comprehensive pathway-based analysis tool for high-throughput data , 2006, BMC Bioinformatics.

[60]  Susumu Goto,et al.  The KEGG databases at GenomeNet , 2002, Nucleic Acids Res..

[61]  Brad T. Sherman,et al.  DAVID: Database for Annotation, Visualization, and Integrated Discovery , 2003, Genome Biology.

[62]  D. Largaespada,et al.  Mammalian mutagenesis using a highly mobile somatic Sleeping Beauty transposon system , 2005, Nature.

[63]  M. Daly,et al.  PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes , 2003, Nature Genetics.

[64]  James R. Knight,et al.  Genome sequencing in microfabricated high-density picolitre reactors , 2005, Nature.

[65]  Takeshi Suzuki,et al.  New genes involved in cancer identified by retroviral tagging , 2002, Nature Genetics.

[66]  Zhenjun Hu,et al.  VisANT: an online visualization and analysis tool for biological interaction data , 2004, BMC Bioinformatics.

[67]  Carl-Fredrik Tiger,et al.  Identification of candidate cancer-causing genes in mouse brain tumors by retroviral tagging. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[68]  P. Finn,et al.  Hubs in biological interaction networks exhibit low changes in expression in experimental asthma , 2007, Molecular systems biology.

[69]  S. Schuster Next-generation sequencing transforms today's biology , 2008, Nature Methods.

[70]  Jun Lu,et al.  Pathway level analysis of gene expression using singular value decomposition , 2005, BMC Bioinformatics.

[71]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[72]  Nakao,et al.  Genome-scale Gene Expression Analysis and Pathway Reconstruction in KEGG. , 1999, Genome informatics. Workshop on Genome Informatics.

[73]  F. Bushman Targeting Survival Integration Site Selection by Retroviruses and LTR-Retrotransposons , 2003, Cell.

[74]  May D. Wang,et al.  GoMiner: a resource for biological interpretation of genomic and proteomic data , 2003, Genome Biology.

[75]  D. Damian,et al.  Statistical concerns about the GSEA procedure , 2004, Nature Genetics.

[76]  T. Golub,et al.  A Mechanism of Cyclin D1 Action Encoded in the Patterns of Gene Expression in Human Cancer , 2003, Cell.

[77]  David A. Largaespada,et al.  MTID: a database of Sleeping Beauty transposon insertions in mice , 2003, Nucleic Acids Res..

[78]  Paul Shinn,et al.  Integration Targeting by Avian Sarcoma-Leukosis Virus and Human Immunodeficiency Virus in the Chicken Genome , 2005, Journal of Virology.

[79]  Hanlee P. Ji,et al.  Next-generation DNA sequencing , 2008, Nature Biotechnology.

[80]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.