Protein sequence classification using extreme learning machine

Traditionally, two protein sequences are classified into the same class if they have high homology in terms of feature patterns extracted through sequence alignment algorithms. These algorithms compare an unseen protein sequence with all the identified protein sequences and returned the higher scored protein sequences. As the sizes of the protein sequence databases are very large, it is a very time consuming job to perform exhaustive comparison of existing protein sequence. Therefore, there is a need to build an improved classification system for effectively identifying protein sequences. In this paper, a recently developed machine learning algorithm referred to as the extreme learning machine (ELM) is used to classify protein sequences with ten classes of super-families downloaded from a public domain database. A comparative study on system performance is conducted between ELM and the main conventional neural network classifier - backpropagation neural networks. Results show that ELM needs up to four orders of magnitude less training time compared to BP Network. The classification accuracy of ELM is also higher than that of BP network. For given network architecture, ELM does not have any control parameters (i.e, stopping criteria, learning rate, learning epoches, etc.) to be manually tuned and can be implemented easily.