Multiple pattern associations for interpreting structural and functional characteristics of biomolecules

Pattern discovery from a data set can be intractable because both the detection and the interpretation of the patterns can be ill-posed and combinatorically explosive. This paper presents a knowledge exploratory method using multiple pattern associations to conjecture structural and functional characteristics of biomolecules. We first consider each site from an ensemble of aligned biomolecules as an attribute and the observed unit at the site as its value. Our method identifies those consistently observed attribute values whose associations with others deviate significantly from their null hypothesis. In addition, variables (representing molecular sites) with the detected attribute values as outcomes can be further analyzed. By integrating these associations, exploratory knowledge for interpreting the detected patterns could be discovered. During the interpretation phase, consistent and relevant descriptions of the data are searched. From the experiments using cytochrome c sequences, the discovered statistical patterns are found to be significant in relating to the location of a site with respect to its molecular structural characteristics and stability of functionality.

[1]  Douglas H. Fisher,et al.  Knowledge Acquisition Via Incremental Conceptual Clustering , 1987, Machine Learning.

[2]  S. Haberman The Analysis of Residuals in Cross-Classified Tables , 1973 .

[3]  Annabel E. Todd,et al.  From structure to function: Approaches and limitations , 2000, Nature Structural Biology.

[4]  M. Caffrey,et al.  Strategies for the study of cytochrome c structure and function by site-directed mutagenesis. , 1994, Biochimie.

[5]  H. Gutfreund,et al.  Chemistry of macromolecules , 1974 .

[6]  Tomasz Imielinski,et al.  Database Mining: A Performance Perspective , 1993, IEEE Trans. Knowl. Data Eng..

[7]  Robert B. Ash,et al.  Information Theory , 2020, The SAGE International Encyclopedia of Mass Media and Society.

[8]  Jan M. Zytkow,et al.  From Contingency Tables to Various Forms of Knowledge in Databases , 1996, Advances in Knowledge Discovery and Data Mining.

[9]  A. Mauk,et al.  Cytochrome C: A Multidisciplinary Approach , 1996 .

[10]  G. C. Mills,et al.  Cytochrome c: gene structure, homology and ancestral relationships. , 1991, Journal of theoretical biology.

[11]  Keith C. C. Chan,et al.  APACS: a system for the automatic analysis and classification of conceptual patterns , 1990, Comput. Intell..

[12]  Yang Wang,et al.  High-Order Pattern Discovery from Discrete-Valued Data , 1997, IEEE Trans. Knowl. Data Eng..

[13]  Andrew K. C. Wong,et al.  Typicality, Diversity, and Feature Pattern of an Ensemble , 1975, IEEE Transactions on Computers.

[14]  Richard E. Dickerson,et al.  The structure of cytochromec and the rates of molecular evolution , 2005, Journal of Molecular Evolution.

[15]  David K. Y. Chiu,et al.  A method for inferring probabilistic consensus structure with applications to molecular sequence data , 1993, Pattern Recognit..

[16]  M. O. Dayhoff,et al.  Atlas of protein sequence and structure , 1965 .

[17]  T. Okada,et al.  Cytochrome c electronic structure characterization toward the analysis of electron transfer mechanism. , 1994, Journal of biochemistry.

[18]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[19]  Guozhu Dong Knowledge Discovery in Databases , 2002 .

[20]  A. Wong,et al.  Statistical analysis of residue variability in cytochrome c. , 1976, Journal of molecular biology.

[21]  Andrew K. C. Wong,et al.  Information Discovery through Hierarchical Maximum Entropy Discretization and Synthesis , 1991, Knowledge Discovery in Databases.

[22]  David K. Y. Chiu,et al.  Inferring consensus structure from nucleic acid sequences , 1991, Comput. Appl. Biosci..

[23]  Andrew K. C. Wong,et al.  An event-covering method for effective probabilistic inference , 1987, Pattern Recognit..

[24]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.