Inferring higher functional information for RIKEN mouse full-length cDNA clones with FACTS.

FACTS (Functional Association/Annotation of cDNA Clones from Text/Sequence Sources) is a semiautomated knowledge discovery and annotation system that integrates molecular function information derived from sequence analysis results (sequence inferred) with functional information extracted from text. Text-inferred information was extracted from keyword-based retrievals of MEDLINE abstracts and by matching of gene or protein names to OMIM, BIND, and DIP database entries. Using FACTS, we found that 47.5% of the 60,770 RIKEN mouse cDNA FANTOM2 clone annotations were informative for text searches. MEDLINE queries yielded molecular interaction-containing sentences for 23.1% of the clones. When disease MeSH and GO terms were matched with retrieved abstracts, 22.7% of clones were associated with potential diseases, and 32.5% with GO identifiers. A significant number (23.5%) of disease MeSH-associated clones were also found to have a hereditary disease association (OMIM Morbidmap). Inferred neoplastic and nervous system disease represented 49.6% and 36.0% of disease MeSH-associated clones, respectively. A comparison of sequence-based GO assignments with informative text-based GO assignments revealed that for 78.2% of clones, identical GO assignments were provided for that clone by either method, whereas for 21.8% of clones, the assignments differed. In contrast, for OMIM assignments, only 28.5% of clones had identical sequence-based and text-based OMIM assignments. Sequence, sentence, and term-based functional associations are included in the FACTS database (http://facts.gsc.riken.go.jp/), which permits results to be annotated and explored through web-accessible keyword and sequence search interfaces. The FACTS database will be a critical tool for investigating the functional complexity of the mouse transcriptome, cDNA-inferred interactome (molecular interactions), and pathome (pathologies).

[1]  Timothy G. Littlejohn,et al.  A Portable Search Engine and Browser for the Entrez DataBase , 1994, J. Comput. Biol..

[2]  Li,et al.  Myocardial Infarction Is Coupled with the Activation of Cyclins and Cyclin-Dependent Kinases in Myocytes , 1996, Experimental cell research.

[3]  M. Blumenberg,et al.  Regulation of epidermal expression of keratin K17 in inflammatory skin diseases. , 1996, The Journal of investigative dermatology.

[4]  G. Schuler,et al.  Entrez: molecular biology database and retrieval system. , 1996, Methods in enzymology.

[5]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[6]  W R Pearson,et al.  Comparison of DNA sequences with protein sequences. , 1997, Genomics.

[7]  Yi Li,et al.  Apoptosis and protein expression after focal cerebral ischemia in rat , 1997, Brain Research.

[8]  A. Bairoch,et al.  The SWISS-PROT protein sequence data bank and its supplement TrEMBL in 1999 , 1999, Nucleic Acids Res..

[9]  K. Köhrer,et al.  Cloning and characterization of hurpin (protease inhibitor 13): A new skin-specific, UV-repressible serine proteinase inhibitor of the ovalbumin serpin family. , 1999, Journal of molecular biology.

[10]  G. Khursigara,et al.  Association of the p75 Neurotrophin Receptor with TRAF6* , 1999, The Journal of Biological Chemistry.

[11]  Y. Ben-Ari,et al.  DNA damage and DNA damage-inducible protein Gadd45 following ischemia in the P7 neonatal rat. , 1999, Brain research. Developmental brain research.

[12]  S. Stafford,et al.  Eotaxin induces degranulation and chemotaxis of eosinophils through the activation of ERK2 and p38 mitogen-activated protein kinases. , 2000, Blood.

[13]  Weiqi Wang,et al.  Cutting Edge: The Orphan Chemokine Receptor G Protein-Coupled Receptor-2 (GPR-2, CCR10) Binds the Skin-Associated Chemokine CCL27 (CTACK/ALP/ILC)1 , 2000, The Journal of Immunology.

[14]  M. Triggiani,et al.  Mechanisms of IgE elevation in HIV-1 infection. , 2000, Critical reviews in immunology.

[15]  Alex Bateman,et al.  InterPro : An integrated documentation resource for protein families , domains and functional sites The InterPro Consortium : , 2005 .

[16]  S. Gee,et al.  Single-amino acid substitutions alter the specificity and affinity of PDZ domains for their ligands. , 2000, Biochemistry.

[17]  Ioannis Xenarios,et al.  DIP: the Database of Interacting Proteins , 2000, Nucleic Acids Res..

[18]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[19]  Toshihisa Takagi,et al.  PNAD-CSS: a workbench for constructing a protein name abbreviation dictionary , 2000, Bioinform..

[20]  Gary D. Bader,et al.  BIND-a data specification for storing and describing biomolecular interactions, molecular complexes and pathways , 2000, Bioinform..

[21]  Toshihisa Takagi,et al.  Automated extraction of information on protein-protein interactions from the biological literature , 2001, Bioinform..

[22]  T. Jenssen,et al.  A literature network of human genes for high-throughput analysis of gene expression , 2001 .

[23]  Ioannis Xenarios,et al.  DIP: The Database of Interacting Proteins: 2001 update , 2001, Nucleic Acids Res..

[24]  A. Valencia,et al.  Mining functional information associated with expression arrays , 2001, Functional & Integrative Genomics.

[25]  Betsy L. Humphreys,et al.  Relationships in Medical Subject Headings (MeSH) , 2001 .

[26]  G. Aust,et al.  Reduced expression of stromal-derived factor 1 in autonomous thyroid adenomas and its regulation in thyroid-derived cells. , 2001, The Journal of clinical endocrinology and metabolism.

[27]  Y. Hayashizaki,et al.  Protein-protein interaction panel using mouse full-length cDNAs. , 2001, Genome research.

[28]  P Bork,et al.  XplorMed: a tool for exploring MEDLINE abstracts. , 2001, Trends in biochemical sciences.

[29]  Jeffrey T. Chang,et al.  Associating genes with gene ontology codes using a maximum entropy analysis of biomedical literature. , 2002, Genome research.

[30]  Y. Matsuo,et al.  Exploration of novel motifs derived from mouse cDNA sequences. , 2002, Genome research.

[31]  Martin Ringwald,et al.  Connecting sequence and biology in the laboratory mouse. , 2003, Genome research.

[32]  Yoshihide Hayashizaki,et al.  The mammalian protein-protein interaction database and its viewing system that is linked to the main FANTOM2 viewer. , 2003, Genome research.

[33]  E. Diamandis,et al.  Differential steroid hormone regulation of human glandular kallikrein (hK2) and prostate-specific antigen (PSA) in breast cancer cell lines , 2000, Breast Cancer Research and Treatment.

[34]  Alan F. Scott,et al.  Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders , 2004, Nucleic Acids Res..