A novel feature fusion method for predicting protein subcellular localization with multiple sites

This paper proposes a novel feature fusion method for the protein subcellular multiple-site localization prediction. Several types of features are employed in this novel protein coding method. The first one is the composition of amino acids. The second is pseudo amino acid composition, which mainly extract the location information of each amino acid residues in protein sequence. Lastly, the information for local sequence of amino acids is taken into consideration in this research. Generally, k nearest neighbor, supporting vector machine and other methods, has been used in the field of protein subcellular localization prediction. In our research, the multi-label k nearest neighbor algorithm has been employed in the classification model. The overall accuracy rate may reach 66.7304% in Gnos-mploc dataset.