A Survey on Data Integration in Bioinformatics

The need for data integration is widely acknowledged in bioinformatics. There are several huge biological databanks now available across the world in different formats. To characterize or apply data mapping between several data sources requires integration of all related data fields. The problem of integration may be addressed using a variety of approaches; some are widely used and some are less so, having failed to achieve the basic requirements of data integration. In this paper, we discuss three techniques for data integration: the federated database system approach, the data warehousing approach and the link-driven approach. While each approach has its strengths and weaknesses, it is important to identify which approach is best suited to a given user’s needs. We also discuss some database systems which use these three different approaches to solving the problem of data integration.

[1]  Val Tannen,et al.  K2/Kleisli and GUS: Experiments in integrated access to genomic data sources , 2001, IBM Syst. J..

[2]  Louiqa Raschid,et al.  Optimized seamless integration of biomolecular data , 2001, Proceedings 2nd Annual IEEE International Symposium on Bioinformatics and Bioengineering (BIBE 2001).

[3]  Zoé Lacroix,et al.  Bioinformatics: Managing Scientific Data , 2013 .

[4]  Todd D. Millstein,et al.  Navigational Plans For Data Integration , 1999, AAAI/IAAI.

[5]  Erhard Rahm,et al.  An Integrated Platform for Analyzing Molecular-Biological Data Within Clinical Studies , 2006, EDBT Workshops.

[6]  Carole A. Goble,et al.  TAMBIS Online: a bioinformatics source integration tool , 1999, Proceedings. Eleventh International Conference on Scientific and Statistical Database Management.

[7]  Peter Buneman,et al.  Challenges in Integrating Biological Data Sources , 1995, J. Comput. Biol..

[8]  Bhavani M. Thuraisingham,et al.  Extended RBAC - Based Design and Implementation for a Secure Data Warehouse , 2007, The Second International Conference on Availability, Reliability and Security (ARES'07).

[9]  L. Wong,et al.  Technologies for Integrating Biological Data , 2002, Briefings Bioinform..

[10]  Subbarao Kambhampati,et al.  Integration of biological sources: current systems and challenges ahead , 2004, SGMD.

[11]  Robert M. Ryder Bringing the data mart into the curriculum , 2000, ACM-SE 38.

[12]  Leena Peltonen,et al.  The federated database – a basis for biobank-based post-genome studies, integrating phenome and genome data from 600 000 twin pairs in Europe , 2007, European Journal of Human Genetics.

[13]  Jef Wijsen,et al.  Current Trends in Database Technology - EDBT 2006, EDBT 2006 Workshops PhD, DataX, IIDB, IIHA, ICSNW, QLQP, PIM, PaRMA, and Reactivity on the Web, Munich, Germany, March 26-31, 2006, Revised Selected Papers , 2006, EDBT Workshops.