From Informatics to Bioinformatics

Informatics has helped in launching molecular biology into the genomic era. It appears certain that informatics will continue to be a major factor in the success of molecular biology in the post-genome era. In this paper, we describe advances made in data integration and data mining technologies that are relevant to molecular biology and biomedical sciences. In particular, we discuss some past and present research results on topics such as (a) the taming of autonomous heterogeneous distributed data sources, (b) the prediction of immunogenic peptides, (c) the discovery of gene structure features, (d) the classification of gene expression profiles, and (e) the extraction of protein interaction information from literature.

[1]  Jinyan Li,et al.  Efficient mining of emerging patterns: discovering trends and differences , 1999, KDD '99.

[2]  Huiqing Liu,et al.  A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns. , 2002, Genome informatics. International Conference on Genome Informatics.

[3]  Steen Knudsen,et al.  Promoter2.0: for the recognition of PolII promoter sequences , 1999, Bioinform..

[4]  J. Downing,et al.  Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. , 2002, Cancer cell.

[5]  Toshihisa Takagi,et al.  Automated extraction of information on protein-protein interactions from the biological literature , 2001, Bioinform..

[6]  Vladimir B. Bajic,et al.  Comparing the Success of Different Prediction Software in Sequence Analysis: A Review , 2000, Briefings Bioinform..

[7]  Limsoon Wong,et al.  PIES, A Protein Interaction Extraction System , 2000, Pacific Symposium on Biocomputing.

[8]  Limsoon Wong,et al.  Using feature generation and feature selection for accurate prediction of translation initiation sites. , 2002, Genome informatics. International Conference on Genome Informatics.

[9]  Vladimir Brusic,et al.  Computational binding assays of antigenic peptides , 2004, Letters in Peptide Science.

[10]  Limsoon Wong,et al.  Kleisli: its exchange format, supporting tools, and an application in protein interaction extraction , 2000, Proceedings IEEE International Symposium on Bio-Informatics and Biomedical Engineering.

[11]  Ng,et al.  Toward Routine Automatic Pathway Discovery from On-line Scientific Text Abstracts. , 1999, Genome informatics. Workshop on Genome Informatics.

[12]  Limsoon Wong,et al.  FIMM, a database of functional molecular immunology , 2000, Nucleic Acids Res..

[13]  Zhou Wen,et al.  Efficient mining of emerging patterns , 2002 .

[14]  Vladimir B. Bajic,et al.  An Intelligent System for Vertebrate Promoter Recognition , 2002, IEEE Intell. Syst..

[15]  Huiqing Liu,et al.  Simple rules underlying gene expression profiles of more than six subtypes of acute lymphoblastic leukemia (ALL) patients , 2003, Bioinform..

[16]  Limsoon Wong,et al.  BioKleisli: a digital library for biomedical researchers , 1997, International Journal on Digital Libraries.

[17]  Miguel A. Andrade-Navarro,et al.  Automatic Extraction of Biological Information from Scientific Text: Protein-Protein Interactions , 1999, ISMB.

[18]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[19]  T. Hubbard,et al.  Computational detection and location of transcription start sites in mammalian genomic DNA. , 2002, Genome research.

[20]  D. Gerhold,et al.  DNA chips: promising toys have become powerful tools. , 1999, Trends in biochemical sciences.

[21]  Michael Conlon O'Donovan,et al.  Faculty Opinions recommendation of Computational identification of promoters and first exons in the human genome. , 2002 .

[22]  G. Schuler,et al.  Entrez: molecular biology database and retrieval system. , 1996, Methods in enzymology.

[23]  Michael Ruogu Zhang,et al.  Computational identification of promoters and first exons in the human genome , 2002, Nature Genetics.

[24]  Limsoon Wong,et al.  Kleisli, a functional query system , 2000, J. Funct. Program..

[25]  Thomas C. Rindflesch,et al.  EDGAR: extraction of drugs, genes and relations from the biomedical literature. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[26]  Ian M. Donaldson,et al.  BIND: the Biomolecular Interaction Network Database , 2001, Nucleic Acids Res..

[27]  V Brusic,et al.  NY-ESO-1 encodes DRB1*0401-restricted epitopes recognized by melanoma-reactive CD4+ T cells. , 2000, Cancer research.

[28]  Vladimir B. Bajic,et al.  Dragon Promoter Finder: recognition of vertebrate RNA polymerase II promoters , 2002, Bioinform..

[29]  C. Ouzounis,et al.  Automatic extraction of protein interactions from scientific abstracts. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[30]  Kenneth H. Fasman,et al.  The GDB human genome data base anno 1993 , 1993, Nucleic Acids Res..

[31]  L C Harrison,et al.  Strategies for identifying and predicting islet autoantigen T-cell epitopes in insulin-dependent diabetes mellitus. , 1997, Annals of medicine.

[32]  G Demetriou,et al.  Two applications of information extraction to biological science journal articles: enzyme interactions and protein structures. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[33]  N W Matheson,et al.  The GDB Human Genome Data Base Anno 1992. , 1992, Nucleic acids research.

[34]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 , 2000, Nucleic Acids Res..

[35]  Jinyan Li,et al.  Geography of Differences between Two Classes of Data , 2002, PKDD.

[36]  Shahrokh Saeednia,et al.  How to maintain both privacy and authentication in digital libraries , 2000 .

[37]  V. Brusic,et al.  Neural network-based prediction of candidate T-cell epitopes , 1998, Nature Biotechnology.

[38]  Huan Liu,et al.  Chi2: feature selection and discretization of numeric attributes , 1995, Proceedings of 7th IEEE International Conference on Tools with Artificial Intelligence.

[39]  Ioannis Xenarios,et al.  DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions , 2002, Nucleic Acids Res..

[40]  Vladimir Brusic,et al.  Large‐scale computational identification of HIV T‐cell epitopes , 2002, Immunology and cell biology.

[41]  Limsoon Wong,et al.  Accomplishments and challenges in literature data mining for biology , 2002, Bioinform..

[42]  T. Werner,et al.  Highly specific localization of promoter regions in large genomic sequences by PromoterInspector: a novel context analysis approach. , 2000, Journal of molecular biology.

[43]  Seng Hong Seah,et al.  Dragon gene start finder: an advanced system for finding approximate locations of the start of gene transcriptional units. , 2003, Genome research.