Classification of Information Fusion Methods in Systems Biology

Biological systems are extremely complex and often involve thousands of interacting components. Despite all efforts, many complex biological systems are still poorly understood. However, over the past few years high-throughput technologies have generated large amounts of biological data, now requiring advanced bioinformatic algorithms for interpretation into valuable biological information. Due to these high-throughput technologies, the study of biological systems has evolved from focusing on single components (e.g. genes) to encompassing large sets of components (e.g. all genes in an entire genome), with the aim to elucidate their interdependences in various biological processes. In addition, there is also an increasing need for integrative analysis, where knowledge about the biological system is derived by data fusion, using heterogeneous data sets as input. We here review representative examples of bioinformatic methods for fusion-oriented interpretation of multiple heterogeneous biological data, and propose a classification into three categories of tasks that they address: data extraction, data integration and data fusion. The aim of this classification is to facilitate the exchange of methods between systems biology and other information fusion application areas.

[1]  David Bryant,et al.  DAVID Bioinformatics Resources: expanded annotation database and novel algorithms to better extract biology from large gene lists , 2007, Nucleic Acids Res..

[2]  Hamid Bolouri,et al.  A data integration methodology for systems biology. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[3]  W. Wong,et al.  Functional annotation and network reconstruction through cross-platform integration of microarray data , 2005, Nature Biotechnology.

[4]  Joaquín Dopazo,et al.  BABELOMICS: a systems biology perspective in the functional annotation of genome-scale experiments , 2006, Nucleic Acids Res..

[5]  Bing Zhang,et al.  WebGestalt: an integrated system for exploring gene sets in various biological contexts , 2005, Nucleic Acids Res..

[6]  G. Church,et al.  Identifying regulatory networks by combinatorial analysis of promoter elements , 2001, Nature Genetics.

[7]  Walter R. Gilks,et al.  Fusing microarray experiments with multivariate regression , 2005, ECCB/JBI.

[8]  James Llinas,et al.  An introduction to multisensor data fusion , 1997, Proc. IEEE.

[9]  Ozlem Keskin,et al.  PRISM: protein interactions by structural matching , 2005, Nucleic Acids Res..

[10]  Jing Gao,et al.  Integration of Genome and Chromatin Structure with Gene Expression Profiles To Predict c-MYC Recognition Site Binding and Function , 2007, PLoS Comput. Biol..

[11]  Björn Olsson,et al.  Mapping of the JDL data fusion model to bioinformatics , 2007, 2007 IEEE International Conference on Systems, Man and Cybernetics.

[12]  Mongi A. Abidi,et al.  Data fusion in robotics and machine intelligence , 1992 .

[13]  Tom Ziemke,et al.  On the Definition of Information Fusion as a Field of Research , 2007 .

[14]  Andrea Clematis,et al.  High performance workflow implementation for protein surface characterization using grid technology , 2005, BMC Bioinformatics.

[15]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[16]  Yoshihiro Yamanishi,et al.  Extraction of correlated gene clusters from multiple genomic data by generalized kernel canonical correlation analysis , 2003, ISMB.

[17]  Christian von Mering,et al.  STRING 7—recent developments in the integration and prediction of protein interactions , 2006, Nucleic Acids Res..

[18]  Pei-Chun Chang,et al.  Genome-wide identification of specific oligonucleotides using artificial neural network and computational genomic analysis , 2006, BMC Bioinformatics.

[19]  Alan N. Steinberg,et al.  Revisions to the JDL data fusion model , 1999, Defense, Security, and Sensing.

[20]  Younghoon Kim,et al.  BioCAD: an information fusion platform for bio-network inference and analysis , 2006 .

[21]  Matthew A. Hibbs,et al.  Discovery of biological networks from diverse functional genomic data , 2005, Genome Biology.

[22]  B. Olsson,et al.  Cardiomyogenic gene expression profiling of differentiating human embryonic stem cells. , 2008, Journal of biotechnology.

[23]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[24]  Raj Acharya,et al.  Clustering of diverse genomic data using information fusion , 2004, SAC '04.

[25]  Joaquín Dopazo,et al.  Next station in microarray data analysis: GEPAS , 2006, Nucleic Acids Res..

[26]  Maurizio Lenzerini,et al.  Data integration: a theoretical perspective , 2002, PODS.

[27]  A. Owen,et al.  A Bayesian framework for combining heterogeneous data sources for gene function prediction (in Saccharomyces cerevisiae) , 2003, Proceedings of the National Academy of Sciences of the United States of America.