Non-parametric classification of protein secondary structures

Proteins were classified into their families using a classification tree method which is based on the coefficient of variations of physico-chemical and geometrical properties of the secondary structures of proteins. The tree method uses as splitting criterion the increase in purity when a node is split into two subnodes and the size of the tree is controlled by a threshold level for the improvement of the apparent misclassification rate (AMR) of the tree after each splitting step. The classification tree method seems effective in reproducing similar structural groupings as the method of dynamic programming. For comparison, we also used another two methods: neural networks and support vector machines. We could show that the presented classification tree method performs better in classifying proteins into their families. The presented algorithm might be suitable for a rapid preliminary classification of proteins into their corresponding families.