Looking at Cerebellar Malformations through Text-Mined Interactomes of Mice and Humans

We have generated and made publicly available two very large networks of molecular interactions: 49,493 mouse-specific and 52,518 human-specific interactions. These networks were generated through automated analysis of 368,331 full-text research articles and 8,039,972 article abstracts from the PubMed database, using the GeneWays system. Our networks cover a wide spectrum of molecular interactions, such as bind, phosphorylate, glycosylate, and activate; 207 of these interaction types occur more than 1,000 times in our unfiltered, multi-species data set. Because mouse and human genes are linked through an orthological relationship, human and mouse networks are amenable to straightforward, joint computational analysis. Using our newly generated networks and known associations between mouse genes and cerebellar malformation phenotypes, we predicted a number of new associations between genes and five cerebellar phenotypes (small cerebellum, absent cerebellum, cerebellar degeneration, abnormal foliation, and abnormal vermis). Using a battery of statistical tests, we showed that genes that are associated with cerebellar phenotypes tend to form compact network clusters. Further, we observed that cerebellar malformation phenotypes tend to be associated with highly connected genes. This tendency was stronger for developmental phenotypes and weaker for cerebellar degeneration.

[1]  David Milward,et al.  Mining protein-protein interactions from published literature using Linguamatics I2E. , 2009, Methods in molecular biology.

[2]  Judith A. Blake,et al.  The Mouse Genome Database (MGD): integrating biology with the genome , 2004, Nucleic Acids Res..

[3]  C. Fallet-Bianco,et al.  Isolated Posterior Cerebellar Vermal Defect: A Morphological Study of Midsagittal Cerebellar Vermis in 4 Fetuses—Early Stage of Dandy-Walker Continuum or New Vermal Dysgenesis? , 2007, Journal of child neurology.

[4]  A. Barabasi,et al.  An empirical framework for binary interactome mapping , 2008, Nature Methods.

[5]  H. Paulson,et al.  Spinocerebellar ataxias: an update , 2007, Current opinion in neurology.

[6]  Masatoshi Nei,et al.  The Wilhelmine E. Key 2001 Invitational Lecture. Estimation of divergence times for a few mammalian and several primate species. , 2002, The Journal of heredity.

[7]  Charles Lee,et al.  Linkage to chromosome 2q36.1 in autosomal dominant Dandy-Walker malformation with occipital cephalocele and evidence for genetic heterogeneity , 2008, Human Genetics.

[8]  Andrey Rzhetsky,et al.  Microparadigms: chains of collective reasoning in publications about molecular interactions. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Thomas H. Davenport,et al.  Book review:Working knowledge: How organizations manage what they know. Thomas H. Davenport and Laurence Prusak. Harvard Business School Press, 1998. $29.95US. ISBN 0‐87584‐655‐6 , 1998 .

[10]  C. Eng,et al.  From developmental disorder to heritable cancer: it's all in the BMP/TGF-beta family. , 2003, Nature reviews. Genetics.

[11]  W. Fitch Distinguishing homologous from analogous proteins. , 1970, Systematic zoology.

[12]  P. Gruss,et al.  Differential induction of Pax genes by NGF and BDNF in cerebellar primary cultures , 1994, The Journal of cell biology.

[13]  Adam J. Smith,et al.  The Database of Interacting Proteins: 2004 update , 2004, Nucleic Acids Res..

[14]  A. V. D. van den Ouweland,et al.  FGFs, their receptors, and human limb malformations: clinical and molecular correlations. , 2002, American journal of medical genetics.

[15]  K. Millen,et al.  Cerebellar development and disease , 2008, Current Opinion in Neurobiology.

[16]  Andrey Rzhetsky,et al.  Emergent behavior of growing knowledge about molecular interactions , 2005, Nature Biotechnology.

[17]  S. Papageorgiou HOX Gene Expression , 2007 .

[18]  K. Millen,et al.  The roof plate regulates cerebellar cell-type specification and proliferation , 2006, Development.

[19]  H. Lehrach,et al.  A Human Protein-Protein Interaction Network: A Resource for Annotating the Proteome , 2005, Cell.

[20]  Judith A. Blake,et al.  The Mouse Genome Database (MGD): mouse biology and model systems , 2007, Nucleic Acids Res..

[21]  L. Al-Gazali,et al.  Extreme Microcephaly With Agyria-Pachygyria, Partial Agenesis of the Corpus Callosum, and Pontocerebellar Dysplasia , 2005, Journal of child neurology.

[22]  C. Hughes,et al.  Of Mice and Not Men: Differences between Mouse and Human Immunology , 2004, The Journal of Immunology.

[23]  Ian M. Donaldson,et al.  The Biomolecular Interaction Network Database and related tools 2005 update , 2004, Nucleic Acids Res..

[24]  Alfonso Valencia,et al.  PLAN2L: a web tool for integrated text mining and literature-based bioentity relation extraction , 2009, Nucleic Acids Res..

[25]  Adrian J. Shepherd,et al.  A realistic assessment of methods for extracting gene/protein interactions from free text , 2009, BMC Bioinformatics.

[26]  Kumaran Kandasamy,et al.  An evaluation of human protein-protein interaction data in the public domain , 2006, BMC Bioinformatics.

[27]  Michael Krauthammer,et al.  Term identification in the biomedical literature , 2004, J. Biomed. Informatics.

[28]  Jiabao Xu,et al.  A mouse protein interactome through combined literature mining with multiple sources of interaction evidence , 2010, Amino Acids.

[29]  M. Wassef,et al.  Origins and control of the differentiation of inhibitory interneurons and glia in the cerebellum. , 2009, Developmental biology.

[30]  L. Mucke,et al.  Abnormal social behaviors in mice lacking Fgf17 , 2008, Genes, brain, and behavior.

[31]  K. Thierauch,et al.  Human SPRY2 inhibits FGF2 signalling by a secreted factor , 2000, Mechanisms of Development.

[32]  H. Çaksen,et al.  A fatal case of cerebellar hypoplasia associated with anterior horn cell disease. , 2003, Genetic counseling.

[33]  S. L. Wong,et al.  Towards a proteome-scale map of the human protein–protein interaction network , 2005, Nature.

[34]  Michael Krauthammer,et al.  GeneWays: a system for extracting, analyzing, visualizing, and integrating molecular pathway data , 2004, J. Biomed. Informatics.

[35]  N. Heintz Gene Expression Nervous System Atlas (GENSAT) , 2004, Nature Neuroscience.

[36]  Michael Krauthammer,et al.  Of truth and pathways: chasing bits of information through myriads of articles , 2002, ISMB.

[37]  R. Myers,et al.  The fibroblast growth factor family and mood disorders. , 2008, Novartis Foundation symposium.

[38]  B. Ruggeri,et al.  PAX genes: roles in development, pathophysiology, and cancer. , 2007, Biochemical pharmacology.

[39]  M. LeDoux,et al.  Cerebellectomy Eliminates the Motor Syndrome of the Genetically Dystonic Rat , 1993, Experimental Neurology.

[40]  B. Weber,et al.  SPRY2 Is an Inhibitor of the Ras/Extracellular Signal-Regulated Kinase Pathway in Melanocytes and Melanoma Cells with Wild-Type BRAF but Not with the V599E Mutant , 2004, Cancer Research.

[41]  D. Hursh,et al.  The Zic family member, odd-paired, regulates the Drosophila BMP, decapentaplegic, during adult head development , 2007, Development.

[42]  M. Eccles,et al.  A PANorama of PAX genes in cancer and development , 2006, Nature Reviews Cancer.

[43]  J. Leestma,et al.  Unappreciated agenesis of cerebellum in an adult: case report of a 38-year-old man. , 2000, The American journal of forensic medicine and pathology.

[44]  Minlie Huang,et al.  Mining physical protein-protein interactions from the literature , 2008, Genome Biology.

[45]  M. Hatten,et al.  Embryonic Precursor Cells from the Rhombic Lip Are Specified to a Cerebellar Granule Neuron Identity , 1996, Neuron.

[46]  Michael R. Seringhaus,et al.  Seeking a New Biology through Text Mining , 2008, Cell.

[47]  M. Goulding,et al.  Ascl1 and Gsh1/2 control inhibitory and excitatory cell fate in spinal sensory interneurons , 2006, Nature Neuroscience.

[48]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[49]  David J. Wild,et al.  Extraction of CYP Chemical Interactions from Biomedical Literature Using Natural Language Processing Methods , 2009, J. Chem. Inf. Model..

[50]  M. Vidal,et al.  Literature-curated protein interaction datasets , 2009, Nature Methods.

[51]  A. Joyner,et al.  Morphology, molecular codes, and circuitry produce the three-dimensional complexity of the cerebellum. , 2007, Annual review of cell and developmental biology.

[52]  Andrey Rzhetsky,et al.  Imitating Manual Curation of Text-Mined Facts in Biomedicine , 2006, PLoS Comput. Biol..

[53]  Carol Friedman,et al.  Two biomedical sublanguages: a description based on the theories of Zellig Harris , 2002, J. Biomed. Informatics.

[54]  Lawrence Hunter,et al.  Biomedical Discovery Acceleration, with Applications to Craniofacial Development , 2009, PLoS Comput. Biol..

[55]  J. Schmahmann Disorders of the cerebellum: ataxia, dysmetria of thought, and the cerebellar cognitive affective syndrome. , 2004, The Journal of neuropsychiatry and clinical neurosciences.

[56]  Dipanwita Roy Chowdhury,et al.  Human protein reference database as a discovery resource for proteomics , 2004, Nucleic Acids Res..

[57]  C. Eng,et al.  From developmental disorder to heritable cancer: it's all in the BMP/TGF-β family , 2003, Nature Reviews Genetics.

[58]  A. Joyner,et al.  Specific regions within the embryonic midbrain and cerebellum require different levels of FGF signaling during development , 2008, Development.

[59]  Russ B. Altman,et al.  Pharmspresso: a text mining tool for extraction of pharmacogenomic concepts and relationships from full text , 2009, BMC Bioinformatics.

[60]  M. Katoh,et al.  FGF signaling inhibitor, SPRY4, is evolutionarily conserved target of WNT signaling pathway in progenitor cells. , 2006, International journal of molecular medicine.

[61]  W. Dobyns,et al.  Human malformations of the midbrain and hindbrain: review and proposed classification scheme. , 2003, Molecular genetics and metabolism.

[62]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[63]  A. Valencia,et al.  Overview of the protein-protein interaction annotation extraction task of BioCreative II , 2008, Genome Biology.

[64]  Michael Krauthammer,et al.  GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles , 2001, ISMB.

[65]  A. Rzhetsky,et al.  Self-Correcting Maps of Molecular Pathways , 2006, PloS one.

[66]  Michael Krauthammer,et al.  A knowledge model for analysis and simulation of regulatory networks , 2000, Bioinform..

[67]  T. Gilliam,et al.  Molecular triangulation: bridging linkage and molecular-network information for identifying candidate genes in Alzheimer's disease. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[68]  Zhiyong Lu,et al.  OpenDMAP: An open source, ontology-driven concept analysis engine, with applications to capturing knowledge regarding protein transport, protein interactions and cell-type-specific gene expression , 2008, BMC Bioinformatics.

[69]  Jinfeng Zhang,et al.  Bayesian inference of protein-protein interactions from biological literature , 2009, Bioinform..

[70]  Cynthia L. Smith,et al.  The Mammalian Phenotype Ontology as a tool for annotating, analyzing and comparing phenotypic information , 2004, Genome Biology.