论文信息 - A System to Integrate and Manipulate Protein Database Using BioPerl and XML

A System to Integrate and Manipulate Protein Database Using BioPerl and XML

The size, complexity and number of databases used for protein information have caused bioinformatics to lag behind in adapting to the need to handle this distributed information. Integrating all the information from different databases into one database is a challenging problem. Our main research is to develop a tool which can be used to access and manipulate protein information from difference databases. In our approach, we have integrated difference databases such as Swiss-prot, PDB, Interpro, and EMBL and transformed these databases in flat file format into relational form using XML and Bioperl. As a result, we showed this tool can search different sizes of protein information stored in relational database and the result can be retrieved faster compared to flat file database. A web based user interface is provided to allow user to access or search for protein information in the local database. Keywords-Protein sequence database, relational database, integrated database.

Rosni Abdullah | Wahidah Husain | Rosalina Abdul Salam | Zurinahni Zainol

[1] Ulf Leser,et al. A proposal for a standard CORBA interface for genome maps , 1999, Bioinform..

[2] Rolf Apweiler,et al. CluSTr: a database of clusters of SWISS-PROT+TrEMBL proteins , 2001, Nucleic Acids Res..

[3] André Bergholz,et al. Sequence comparison using a relational database approach , 1997, Proceedings of the 1997 International Database Engineering and Applications Symposium (Cat. No.97TB100166).

[4] E. Myers,et al. Basic local alignment search tool. , 1990, Journal of molecular biology.

[5] Alon Y. Halevy,et al. A model for data integration systems of biomedical data applied to online genetic databases , 2001, AMIA.

[6] Patricia Rodriguez-Tomé,et al. Accessing and distributing EMBL data using CORBA (common object request broker architecture) , 2000, Genome Biology.

[7] Yuhong Wang,et al. Storing biological sequence databases in relational form , 2000, Bioinform..