Recent trends in knowledge and data integration for the life sciences

The bioscience field has seen some spectacular advances in genomic and proteomic technologies that are able to deliver vast quantities of information on cellular activity. Such technologies are of critical importance to biology, medical science and in drug discovery. However, living systems are highly complex and to fully exploit these technologies requires knowledge at many different levels. Information such as genome sequence data, gene expression data, protein-to-protein interactions and metabolic pathways is required to understand the complexity of biological processes. The challenge for bioinformatics is to tackle the problem of fragmentation of knowledge by integrating the many sources of heterogeneous information into a coherent entity. Another problem is that the high level of biological complexity and the fragmented nature of biological research has meant that it is difficult to keep fully conversant with the latest research and discoveries. Progress in one area of biology may have implications for other areas but the dissemination of this knowledge is not straightforward; difficulties such as differences in naming conventions for genes and biological processes has led to confusion and the lack of productivity. This paper reviews the most recent research to overcome the fragmentation problem where technologies such as text mining and ontologies are used within the knowledge discovery process and the specific technical challenges they address.

[1]  Tao Han,et al.  Microarray scanner calibration curves: characteristics and implications , 2005, BMC Bioinformatics.

[2]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[3]  Arthur M. Lesk,et al.  Introduction to bioinformatics , 2002 .

[4]  John Quackenbush,et al.  Computational genetics: Computational analysis of microarray data , 2001, Nature Reviews Genetics.

[5]  Jason E. Stewart,et al.  Minimum information about a microarray experiment (MIAME)—toward standards for microarray data , 2001, Nature Genetics.

[6]  Partha S. Vasisht Computational Analysis of Microarray Data , 2003 .

[7]  Mary Roth,et al.  Information integration: A new generation of information technology , 2002, IBM Syst. J..

[8]  C. V. Jongeneel,et al.  eVOC: a controlled vocabulary for unifying gene expression data. , 2003, Genome research.

[9]  Alan R. Powell,et al.  Integration of text- and data-mining using ontologies successfully selects disease gene candidates , 2005, Nucleic acids research.

[10]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[11]  J. Bard,et al.  Ontologies in biology: design, applications and future challenges , 2004, Nature Reviews Genetics.

[12]  Walter V. Sujansky,et al.  Heterogeneous Database Integration in Biomedicine , 2001, J. Biomed. Informatics.

[13]  Stephen P Gardner,et al.  Ontologies and semantic data integration. , 2005, Drug discovery today.

[14]  Calton Pu,et al.  Querying multiple bioinformatics information sources: can semantic web research help? , 2002, SGMD.

[15]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Jie Liang,et al.  Computational analysis of microarray gene expression profiles: clustering, classification, and beyond , 2002 .

[17]  A. Valencia,et al.  Mining functional information associated with expression arrays , 2001, Functional & Integrative Genomics.

[18]  Laura M. Haas,et al.  DiscoveryLink: A system for integrated access to life sciences data sources , 2001, IBM Syst. J..

[19]  L. Grivell Mining the bibliome: searching for a needle in a haystack? , 2002, EMBO reports.

[20]  Lucila Ohno-Machado,et al.  A primer on gene expression and microarrays for machine learning researchers , 2004, J. Biomed. Informatics.

[21]  Itamar Simon,et al.  MILANO – custom annotation of microarray results using automatic literature searches , 2005, BMC Bioinformatics.

[22]  Elisa Bertino,et al.  State-of-the-art in privacy preserving data mining , 2004, SGMD.

[23]  John Durkin,et al.  Expert Systems , 1994 .

[24]  Per Capita,et al.  About the authors , 1995, Machine Vision and Applications.

[25]  김삼묘,et al.  “Bioinformatics” 특집을 내면서 , 2000 .

[26]  Tao Xu,et al.  Atlas – a data warehouse for integrative bioinformatics , 2005, BMC Bioinformatics.

[27]  KambhampatiSubbarao,et al.  Integration of biological sources , 2004 .

[28]  Carlos Alberto Heuser,et al.  Integrating Biological Databases , 2003, SBBD.

[29]  ButtlerDavid,et al.  Querying multiple bioinformatics information sources , 2002 .

[30]  Subbarao Kambhampati,et al.  Integration of biological sources: current systems and challenges ahead , 2004, SGMD.

[31]  D. Chaussabel,et al.  Mining microarray expression data by literature profiling , 2002, Genome Biology.