Protein structure and fold prediction using tree-augmented naive Bayesian classifier.
暂无分享,去创建一个
For determining the structure class and fold class of Protein Structure, computer-based techniques have became essential considering the large volume of the data. Several techniques based on sequence similarity. Neural Networks, SVMs, etc have been applied. This paper presents a framework using the Tree-Augmented Networks (TAN) based on the theory of learning Bayesian networks but with less restrictive assumptions than the naive Bayesian networks. In order to enhance TAN's performance, pre-processing of data is done by feature discretization and post-processing is done by using Mean Probability Voting (MPV) scheme. The advantage of using Bayesian approach over other learning methods is that the network structure is intuitive. In addition, one can read off the TAN structure probabilities to determine the significance of each feature (say, Hydrophobicity) for each class, which help to further understand the mystery of protein structure. Experimental results and comparison with other works over two databases show the effectiveness of our TAN based framework. The idea is implemented as the BAYESPROT web server and it is available at http://www-appn.comp.nus.edu.sg/-bioinfo/bayesprot/Default.htm.
[1] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.
[2] 김삼묘,et al. “Bioinformatics” 특집을 내면서 , 2000 .