论文信息 - Discovering knowledge in DNA and protein data

Discovering knowledge in DNA and protein data

This research investigates a method for discovering knowledge in structural data. We have implemented the SUBDUE substructure discovery system which discovers interesting and repetitive subgraphs in a labeled graph representation using the minimum description length principle. Experiments have shown SUBDUE's applicability in a variety of domains. We are currently applying SUBDUE to both DNA and protein data from the Brookhaven PDB, where SUBDUE was able to find patterns in secondary structure that are both characteristic and unique to categories of proteins, such as hemoglobin and myoglobin. Ultimately, we plan to use SUBDUE to find structural patterns in functional groups of proteins and the boundaries of genes in DNA.

Lawrence Holder

[1] Lawrence B. Holder,et al. Scalable Discovery of Informative Structural Concepts Using Domain Knowledge , 1996, IEEE Expert.

[2] Lawrence B. Holder,et al. Improving Scalability in a Scientific Discovery System by Exploiting Parallelism , 1997, KDD.