Sumomo: A Protein Surface Motif Mining Module

Protein surface motifs, which can be defined as commonly appearing patterns of shape and physical properties in protein molecular surfaces, can be considered "possible active sites". We have developed a system for mining surface motifs: SUMOMO which consists of two phases: surface motif extraction and surface motif filtering. In the extraction phase, a given set of protein molecular surface data is divided into small surfaces called unit surfaces. After extracting several common unit surfaces as candidate motifs, they are repetitively merged into surface motifs. However, a large amount of surface motifs is extracted in this phase, making it difficult to distinguish whether the extracted motifs are significant to be considered active sites. Since active sites from proteins with a particular function have similar shape and physical properties, proteins can be classified based on similarity among local surfaces. Thus, in the filtering phase, local surfaces extracted from proteins of the same group are considered significant motifs, and the rest are filtered out. The proposed method was applied to discover surface motifs from 15 proteins belonging to four function groups. Motifs corresponding to all 4 known functional sites were recognised.

[1]  K. Kinoshita,et al.  Identification of protein functions from a molecular surface database, eF-site , 2004, Journal of Structural and Functional Genomics.

[2]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[3]  Brian B. Goldman,et al.  QSD quadratic shape descriptors. 2. Molecular docking using quadratic shape descriptors (QSDock) , 2000, Proteins.

[4]  Yiming Yang,et al.  An Evaluation of Statistical Approaches to Text Categorization , 1999, Information Retrieval.

[5]  C. Branden,et al.  Introduction to protein structure , 1991 .

[6]  Gail J. Bartlett,et al.  Analysis of catalytic residues in enzyme active sites. , 2002, Journal of molecular biology.

[7]  Kaizhong Zhang,et al.  Automated Discovery of Active Motifs in Three Dimensional Molecules , 1997, KDD.

[8]  F. C. Lucibello,et al.  Multiple interdependent regulatory sites in the mouse c-fos promoter determine basal level transcription: cell type-specific effects , 1991, Nucleic Acids Res..

[9]  Srinivasan Parthasarathy,et al.  MotifMiner: a general toolkit for efficiently identifying common substructures in molecules , 2003, Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings..

[10]  David D. Lewis,et al.  An evaluation of phrasal and clustered representations on a text categorization task , 1992, SIGIR '92.

[11]  Kaizhong Zhang,et al.  An Index Structure for Data Mining and Clustering , 2000, Knowledge and Information Systems.

[12]  Gabriele Ausiello,et al.  SURFACE: a database of protein surface regions for functional annotation , 2004, Nucleic Acids Res..

[13]  R M Jackson,et al.  The serine protease inhibitor canonical loop conformation: examples found in extracellular hydrolases, toxins, cytokines and viral proteins. , 2000, Journal of molecular biology.

[14]  Kaizhong Zhang,et al.  Finding Patterns in Three-Dimensional Graphs: Algorithms and Applications to Scientific Data Mining , 2002, IEEE Trans. Knowl. Data Eng..

[15]  R. Laskowski SURFNET: a program for visualizing molecular surfaces, cavities, and intermolecular interactions. , 1995, Journal of molecular graphics.

[16]  Annabel E. Todd,et al.  From structure to function: Approaches and limitations , 2000, Nature Structural Biology.

[17]  Haruki Nakamura,et al.  A method of comparing protein molecular surface based on normal vectors with attributes and its application to function identification , 2002, Inf. Sci..

[18]  N Go,et al.  Structural motif of phosphate-binding site common to various protein superfamilies: all-against-all structural comparison of protein-mononucleotide complexes. , 1999, Protein engineering.

[19]  Sholom M. Weiss,et al.  Automated learning of decision rules for text categorization , 1994, TOIS.

[20]  A. Bairoch,et al.  PROSITE: recent developments. , 1994, Nucleic acids research.

[21]  William Noble Grundy,et al.  Meta-MEME: motif-based hidden Markov models of protein families , 1997, Comput. Appl. Biosci..

[22]  R. Nussinov,et al.  Molecular shape comparisons in searches for active sites and functional similarity. , 1998, Protein engineering.

[23]  Susumu Goto,et al.  LIGAND: chemical database for enzyme reactions , 1998, Bioinform..