Protein subcellular localization of fluorescence microscopy images: Employing new statistical and Texton based image features and SVM based ensemble classification

Prediction system "IEH-GT" for protein localization in fluorescence microscopy images.Utilizing information about the patterns of intensity-variations of proteins.New feature extraction based on individual features along various orientations.Discriminative learning through individual feature spaces of both GLCMs and textons.Employing the learning capabilities of ensemble using SVM and majority voting. Image based subcellular localization of proteins is a challenging pattern recognition task and is of prime importance in understanding different functions of proteins. Extracting information about the protein's subcellular location using discriminative feature extraction strategies helps in understanding the behavior of different proteins under different circumstances.In this work, an intelligent prediction system for protein subcellular localization using fluorescence microscopy images is developed. The proposed prediction system IEH-GT uses a new feature extraction strategy and ensemble classification. The new feature extraction mechanism exploits statistical and Texton based image descriptors, whereas ensemble classification is performed using the majority voting based ensemble of SVMs. The novelty of IEH-GT prediction system lies (a) in the individual exploitation of the feature spaces generated from both individual Gray Level Co-occurrence Matrices (GLCMs) and Texton images as well as (b) in the manner the extracted features are exploited using the learning capabilities of SVM, which is utilized as base classifier in majority voting based ensemble. In this work, the effect of the individual orientations of GLCMs and Texton images on the capability of protein-recognition is investigated.It is observed that GLCMs along different orientations characterize different information regarding intensity distribution and relative positions of pixels in a certain neighborhood. Similarly, the individually generated features from each Texton image help in reducing the overlapping information. The proposed IEH-GT prediction system is thus able to effectively exploit information about the patterns of intensity-variations of proteins. It is experimentally observed that due to the capability of the proposed IEH-GT system in differentiating diverse surface structures of proteins, the mis-predictions of similar proteins such as Golgpp and Golgia are substantially reduced.

[1]  M. Ebrahimi,et al.  Neural network and SVM classifiers accurately predict lipid binding proteins, irrespective of sequence homology. , 2014, Journal of theoretical biology.

[2]  Yi-Hung Huang,et al.  A spectral graph theoretic approach to quantification and calibration of collective morphological differences in cell images , 2010, Bioinform..

[3]  Yungang Zhang,et al.  Phenotype Recognition by Curvelet Transform and Random Subspace Ensemble , 2011 .

[4]  A. Esmaeili,et al.  Prediction of GABAA receptor proteins using the concept of Chou's pseudo-amino acid composition and support vector machine. , 2011, Journal of theoretical biology.

[5]  Chun-Nan Hsu,et al.  Boosting multiclass learning with repeating codes and weak detectors for protein subcellular localization , 2007, Bioinform..

[6]  Ying Ju,et al.  Review of Protein Subcellular Localization Prediction , 2014 .

[7]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[8]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[9]  Loris Nanni,et al.  Novel features for automated cell phenotype image classification. , 2010, Advances in experimental medicine and biology.

[10]  Henri Xhaard,et al.  Predicting G-protein-coupled receptors families using different physiochemical properties and pseudo amino acid composition. , 2013, Methods in enzymology.

[11]  Min Han,et al.  Ensemble of extreme learning machine for remote sensing image classification , 2015, Neurocomputing.

[12]  Jin Qi,et al.  Predicting electrical evoked potential in optic nerve visual prostheses by using support vector regression and case-based prediction , 2015, Inf. Sci..

[13]  Jason Weston,et al.  A user's guide to support vector machines. , 2010, Methods in molecular biology.

[14]  Shao-Wu Zhang,et al.  Using the concept of Chou’s pseudo amino acid composition to predict protein subcellular localization: an approach by incorporating evolutionary information and von Neumann entropies , 2008, Amino Acids.

[15]  Muhammad Tahir,et al.  Protein subcellular localization of fluorescence imagery using spatial and transform domain features , 2012, Bioinform..

[16]  K. Chou Some remarks on protein attribute prediction and pseudo amino acid composition , 2010, Journal of Theoretical Biology.

[17]  Bela Julesz,et al.  A theory of preattentive texture discrimination based on first-order statistics of textons , 2004, Biological Cybernetics.

[18]  Alessandra Lumini,et al.  Subcellular localization using fluorescence imagery: Utilizing ensemble classification with diverse feature extraction strategies and data balancing , 2013, Appl. Soft Comput..

[19]  Tariq Habib Afridi,et al.  Mito-GSAAC: mitochondria prediction using genetic ensemble classifier and split amino acid composition , 2012, Amino Acids.

[20]  Asifullah Khan,et al.  G-protein-coupled receptor prediction using pseudo-amino-acid composition and multiscale energy representation of different physiochemical properties. , 2011, Analytical biochemistry.

[21]  Gürsel Serpen,et al.  Performance of global–local hybrid ensemble versus boosting and bagging ensembles , 2012, International Journal of Machine Learning and Cybernetics.

[22]  J. Nieto,et al.  Use of fuzzy clustering technique and matrices to classify amino acids and its impact to Chou's pseudo amino acid composition. , 2009, Journal of theoretical biology.

[23]  Shie-Jue Lee,et al.  A weighted LS-SVM based learning system for time series forecasting , 2015, Inf. Sci..

[24]  S.-W. Zhang,et al.  Prediction of protein subcellular localization by support vector machines using multi-scale energy and pseudo amino acid composition , 2007, Amino Acids.

[25]  Alaa Eleyan,et al.  Co-occurrence matrix and its statistical features as a new approach for face recognition , 2011, Turkish Journal of Electrical Engineering and Computer Sciences.

[26]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[27]  Robert F. Murphy,et al.  Automated image analysis of protein localization in budding yeast , 2007, ISMB/ECCB.

[28]  Mohammed Yeasin,et al.  Prediction of membrane proteins using split amino acid and ensemble classification , 2011, Amino Acids.

[29]  Yaonan Wang,et al.  Texture classification using the support vector machines , 2003, Pattern Recognit..

[30]  Loris Nanni,et al.  A reliable method for cell phenotype image classification , 2008, Artif. Intell. Medicine.

[31]  A. Bouridane,et al.  Classification of cancer cells based on Haralick's Coefficients using Multi-spectral images , 2010 .

[32]  S. Gunn Support Vector Machines for Classification and Regression , 1998 .

[33]  Nicholas A. Hamilton,et al.  Fast automated cell phenotype image classification , 2007, BMC Bioinformatics.

[34]  Congxin Wu,et al.  Separating theorem of samples in Banach space for support vector machine learning , 2011, Int. J. Mach. Learn. Cybern..

[35]  Asifullah Khan,et al.  MemHyb: predicting membrane protein types by hybridizing SAAC and PSSM. , 2012, Journal of theoretical biology.

[36]  Julio López,et al.  Imbalanced data classification using second-order cone programming support vector machines , 2014, Pattern Recognit..

[37]  Jelena Kovacevic,et al.  A multiresolution approach to automated classification of protein subcellular location images , 2007, BMC Bioinformatics.

[38]  Asifullah Khan,et al.  Predicting membrane protein types by fusing composite protein sequence features into pseudo amino acid composition. , 2011, Journal of theoretical biology.

[39]  L. Nanni,et al.  Selecting the Best Performing Rotation Invariant Patterns in Local Binary/Ternary Patterns , 2010, IPCV.

[40]  Robert F. Murphy,et al.  A neural network classifier capable of recognizing the patterns of all major subcellular structures in fluorescence microscope images of HeLa cells , 2001, Bioinform..

[41]  Hichem Sahbi,et al.  Kernel PCA for similarity invariant shape recognition , 2007, Neurocomputing.

[42]  L Shamir,et al.  Assessing the efficacy of low‐level image content descriptors for computer‐based fluorescence microscopy image analysis , 2011, Journal of microscopy.

[43]  Martha Pulido,et al.  Particle swarm optimization of ensemble neural networks with fuzzy aggregation for time series prediction of the Mexican Stock Exchange , 2014, Inf. Sci..

[44]  Wei Chen,et al.  Predicting peroxidase subcellular location by hybridizing different descriptors of Chou' pseudo amino acid patterns. , 2014, Analytical biochemistry.

[45]  Loris Nanni,et al.  Fusion of systems for automated cell phenotype image classification , 2010, Expert Syst. Appl..

[46]  Songfeng Zheng A fast algorithm for training support vector regression via smoothed primal function minimization , 2015, Int. J. Mach. Learn. Cybern..

[47]  Bandana Kumari,et al.  Protein Sub-Nuclear Localization Prediction Using SVM and Pfam Domain Information , 2014, PloS one.

[48]  Suyu Mei,et al.  Multi-kernel transfer learning based on Chou's PseAAC formulation for protein submitochondria localization. , 2012, Journal of theoretical biology.

[49]  Loris Nanni,et al.  A simple method for improving local binary patterns by considering non-uniform patterns , 2012, Pattern Recognit..

[50]  Jian Ma,et al.  Sentiment classification: The contribution of ensemble learning , 2014, Decis. Support Syst..