Mining Protein Databases

Due to the continuously increasing amount of available protein data, automated knowledge extraction from protein databases is an increasingly important task. Particularly the 3D geometries of proteins play a significant role for the interaction of molecules. As part of our work, we focus on the following problems: (1) Similarity Models for Proteins according to their 3D Structure. Efficient database algorithms support a fast classification of molecules with respect to their geometry. As our experiments demonstrate, the geometric approach is competitive with existing functional classifications such as CATH or FSSP. (2) Database Support for the 1:n Docking Prediction. From a database of 3D protein structures, sets of molecules are retrieved that potentially interact with a given query protein. Several representations of surface segments are investigated. Examples include approximations by parametric surface functions and surface models using activity maps and feature graphs. (3) Similarity of Biological Pathways. Whereas docking prediction addresses the biochemical level of molecular interactions, at a higher level, complex metabolic systems consist of networks of biochemical reactions. The level of biological pathways supports the investigation of relationships among similar organisms. New similarity models as well as efficient algorithms for query processing need to be developed for this task.