CMGSDB: integrating heterogeneous Caenorhabditis elegans data sources using compositional data mining

CMGSDB (Database for Computational Modeling of Gene Silencing) is an integration of heterogeneous data sources about Caenorhabditis elegans with capabilities for compositional data mining (CDM) across diverse domains. Besides gene, protein and functional annotations, CMGSDB currently unifies information about 531 RNAi phenotypes obtained from heterogeneous databases using a hierarchical scheme. A phenotype browser at the CMGSDB website serves this hierarchy and relates phenotypes to other biological entities. The application of CDM to CMGSDB produces ‘chains’ of relationships in the data by finding two-way connections between sets of biological entities. Chains can, for example, relate the knock down of a set of genes during an RNAi experiment to the disruption of a pathway or specific gene expression through another set of genes not directly related to the former set. The web interface for CMGSDB is available at https://bioinformatics.cs.vt.edu/cmgs/CMGSDB/, and serves individual biological entity information as well as details of all chains computed by CDM.

[1]  Mohammed J. Zaki,et al.  CHARM: An Efficient Algorithm for Closed Itemset Mining , 2002, SDM.

[2]  A. Fraser,et al.  Chromatin regulation and sumoylation in the inhibition of Ras‐induced vulval development in Caenorhabditis elegans , 2005, The EMBO journal.

[3]  G. Ruvkun,et al.  Lifespan Regulation by Evolutionarily Conserved Genes Essential for Viability , 2007, PLoS genetics.

[4]  S. Strome,et al.  MES-1, a protein required for unequal divisions of the germline in early C. elegans embryos, resembles receptor tyrosine kinases and is localized to the boundary between the germline and gut cells. , 2000, Development.

[5]  D. Morton,et al.  par-4, a gene required for cytoplasmic localization and determination of specific cell types in Caenorhabditis elegans embryogenesis. , 1992, Genetics.

[7]  T. Lamitina,et al.  Genome-wide RNAi screening identifies protein damage as a regulator of osmoprotective gene expression , 2006, Proceedings of the National Academy of Sciences.

[8]  L. Rose,et al.  Mutations in ooc-5 and ooc-3 disrupt oocyte formation and the reestablishment of asymmetric PAR protein localization in two-cell Caenorhabditis elegans embryos. , 1999, Developmental biology.

[9]  D. Fay,et al.  The C. elegans Glycopeptide Hormone Receptor Ortholog, FSHR-1, Regulates Germline Differentiation and Survival , 2007, Current Biology.

[10]  J. Kimble,et al.  POP-1 controls axis formation during early gonadogenesis in C. elegans. , 2002, Development.

[11]  A. Golden,et al.  Components of the Spindle Assembly Checkpoint Regulate the Anaphase-Promoting Complex During Meiosis in Caenorhabditis elegans , 2007, Genetics.

[12]  Weiwei Zhong,et al.  Genome-Wide Prediction of C. elegans Genetic Interactions , 2006, Science.

[13]  Mohammed J. Zaki,et al.  Efficient algorithms for mining closed itemsets and their lattice structure , 2005, IEEE Transactions on Knowledge and Data Engineering.

[14]  A. Fraser,et al.  Genome-wide RNAi identifies p53-dependent and -independent regulators of germ cell apoptosis in C. elegans , 2004, Cell Death and Differentiation.

[15]  Seung-Jae V. Lee,et al.  Lifespan extension by conditions that inhibit translation in Caenorhabditis elegans , 2007, Aging cell.

[16]  Yuji Kohara,et al.  Large-scale analysis of gene function in Caenorhabditis elegans by high-throughput RNAi , 2001, Current Biology.

[17]  C. Luan,et al.  Structural genomics of Caenorhabditis elegans: structure of the BAG domain. , 2004, Acta crystallographica. Section D, Biological crystallography.

[18]  Georgi Georgiev,et al.  PhenomicDB: a new cross-species genotype/phenotype resource , 2006, Nucleic Acids Res..

[19]  Harald Hutter,et al.  Axon guidance genes identified in a large-scale RNAi screen using the RNAi-hypersensitive Caenorhabditis elegans strain nre-1(hd20) lin-15b(hd126) , 2007, Proceedings of the National Academy of Sciences.

[20]  R. Lin,et al.  Phosphorylation by the β-Catenin/MAPK Complex Promotes 14-3-3-Mediated Nuclear Export of TCF/POP-1 in Signal-Responsive Cells in C. elegans , 2004, Cell.

[21]  Yo Suzuki,et al.  Genetic redundancy masks diverse functions of the tumor suppressor gene PTEN during C. elegans development. , 2006, Genes & development.

[22]  Bob Goldstein,et al.  Wnt/Frizzled Signaling Controls C. elegans Gastrulation by Activating Actomyosin Contractility , 2006, Current Biology.

[23]  Gary Ruvkun,et al.  Functional Genomic Analysis of C. elegans Molting , 2005, PLoS biology.

[24]  Paul W. Sternberg,et al.  WormBook: the online review of Caenorhabditis elegans biology , 2006, Nucleic Acids Res..

[25]  J ZakiMohammed,et al.  Efficient Algorithms for Mining Closed Itemsets and Their Lattice Structure , 2005 .

[26]  Julie Ahringer,et al.  A Genome-Wide Screen Identifies 27 Genes Involved in Transposon Silencing in C. elegans , 2003, Current Biology.

[27]  Pierre Gönczy,et al.  Cortical localization of the Gα protein GPA-16 requires RIC-8 function during C. elegans asymmetric cell division , 2005, Development.

[28]  T. M. Murali,et al.  Compositional mining of multirelational biological datasets , 2008, TKDD.

[29]  P. O’Farrell,et al.  The endocytic pathway mediates cell entry of dsRNA to induce RNAi silencing , 2006, Nature Cell Biology.

[30]  R. Plasterk,et al.  Gene interactions in the DNA damage-response pathway identified by genome-wide RNA-interference analysis of synthetic lethality. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[31]  Sunita Sarawagi,et al.  Efficient set joins on similarity predicates , 2004, SIGMOD '04.

[32]  T. C. Evans,et al.  Genes required for GLP-1 asymmetry in the early Caenorhabditis elegans embryo. , 1997, Developmental biology.

[33]  Roded Sharan,et al.  Discovering statistically significant biclusters in gene expression data , 2002, ISMB.

[34]  Sven Bergmann,et al.  Iterative signature algorithm for the analysis of large-scale gene expression data. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[35]  Andrew G Fraser,et al.  Identification of genes that protect the C. elegans genome against mutations by genome-wide RNAi. , 2003, Genes & development.

[36]  J. Schwarzbauer,et al.  A systematic RNA interference screen reveals a cell migration gene network in C. elegans , 2006, Journal of Cell Science.

[37]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[38]  K. Kemphues,et al.  par-6, a gene involved in the establishment of asymmetry in early C. elegans embryos, mediates the asymmetric localization of PAR-3. , 1996, Development.

[39]  Marc Vidal,et al.  Systematic analysis of genes required for synapse structure and function , 2005, Nature.

[40]  Hua Cheng,et al.  Gαo/i and Gαs Signaling Function in Parallel with the MSP/Eph Receptor to Control Meiotic Diapause in C. elegans , 2006, Current Biology.

[41]  Yannis Manolopoulos,et al.  Efficient similarity search for market basket data , 2002, The VLDB Journal.

[42]  Alejandro Chavez,et al.  Genome-wide RNA interference screen identifies previously undescribed regulators of polyglutamine aggregation. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[43]  B. Draper,et al.  The maternal genes apx-1 and glp-1 and establishment of dorsal-ventral polarity in the early C. elegans embryo , 1994, Cell.

[44]  Dimitrios Gunopulos,et al.  Automatic Subspace Clustering of High Dimensional Data , 2005, Data Mining and Knowledge Discovery.

[45]  R. Lin,et al.  Encodes an HMG Box Protein Required for the Specification of a Mesoderm Precursor in Early C . elegans Embryos , 2004 .

[46]  Hua Cheng,et al.  Galphao/i and Galphas signaling function in parallel with the MSP/Eph receptor to control meiotic diapause in C. elegans. , 2006, Current biology : CB.

[47]  G. Ruvkun,et al.  A systematic RNAi screen for longevity genes in C. elegans. , 2005, Genes & development.

[48]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[49]  A. Fire,et al.  Potent and specific genetic interference by double-stranded RNA in Caenorhabditis elegans , 1998, Nature.

[50]  Gavin Sherlock,et al.  The Stanford Microarray Database accommodates additional microarray platforms and data formats , 2004, Nucleic Acids Res..

[51]  A. Coulson,et al.  Full-genome RNAi profiling of early embryogenesis in Caenorhabditis elegans , 2005, Nature.

[52]  R. Lin,et al.  pop-1 Encodes an HMG box protein required for the specification of a mesoderm precursor in Early C. elegans embryos , 1995, Cell.

[53]  Mike Tyers,et al.  BioGRID: a general repository for interaction datasets , 2005, Nucleic Acids Res..