Protein subcellular localization in human and hamster cell lines: employing local ternary patterns of fluorescence microscopy images.

Discriminative feature extraction technique is always required for the development of accurate and efficient prediction systems for protein subcellular localization so that effective drugs can be developed. In this work, we showed that Local Ternary Patterns (LTPs) effectively exploit small variations in pixel intensities; present in fluorescence microscopy based protein images of human and hamster cell lines. Further, Synthetic Minority Oversampling Technique is applied to balance the feature space for the classification stage. We observed that LTPs coupled with data balancing technique could enable a classifier, in this case support vector machine, to yield good performance. The proposed ensemble based prediction system, using 10-fold cross-validation, has yielded better performance compared to existing techniques in predicting various subcellular compartments for both 2D HeLa and CHO datasets. The proposed predictor is available online at: http://111.68.99.218/Protein_SubLoc/, which is freely accessible to the public.

[1]  Robert F. Murphy,et al.  A neural network classifier capable of recognizing the patterns of all major subcellular structures in fluorescence microscope images of HeLa cells , 2001, Bioinform..

[2]  Mohammed Yeasin,et al.  Prediction of membrane proteins using split amino acid and ensemble classification , 2011, Amino Acids.

[3]  Kuo-Chen Chou,et al.  Predict and analyze S-nitrosylation modification sites with the mRMR and IFS approaches. , 2012, Journal of proteomics.

[4]  S.-W. Zhang,et al.  Prediction of protein subcellular localization by support vector machines using multi-scale energy and pseudo amino acid composition , 2007, Amino Acids.

[5]  David J. Hand,et al.  A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems , 2001, Machine Learning.

[6]  K. Chou,et al.  iLoc-Hum: using the accumulation-label scale to predict subcellular locations of human proteins with both single and multiple sites. , 2012, Molecular bioSystems.

[7]  Loris Nanni,et al.  A reliable method for cell phenotype image classification , 2008, Artif. Intell. Medicine.

[8]  Kuo-Chen Chou,et al.  A novel sequence-based method for phosphorylation site prediction with feature selection and analysis. , 2012, Protein and peptide letters.

[9]  Kuo-Bin Li,et al.  Predicting membrane protein types by incorporating protein topology, domains, signal peptides, and physicochemical properties into the general form of Chou's pseudo amino acid composition. , 2013, Journal of theoretical biology.

[10]  Muhammad Tahir,et al.  Protein subcellular localization of fluorescence imagery using spatial and transform domain features , 2012, Bioinform..

[11]  K. Chou,et al.  Nucleosome positioning based on the sequence word composition. , 2012, Protein and peptide letters.

[12]  Kuo-Chen Chou,et al.  Some remarks on predicting multi-label attributes in molecular biosystems. , 2013, Molecular bioSystems.

[13]  Loris Nanni,et al.  Fusion of systems for automated cell phenotype image classification , 2010, Expert Syst. Appl..

[14]  A. Esmaeili,et al.  Prediction of GABAA receptor proteins using the concept of Chou's pseudo-amino acid composition and support vector machine. , 2011, Journal of theoretical biology.

[15]  Robert F. Murphy,et al.  Robust Numerical Features for Description and Classification of Subcellular Location Patterns in Fluorescence Microscope Images , 2003, J. VLSI Signal Process..

[16]  K. Chou,et al.  iLoc-Virus: a multi-label learning classifier for identifying the subcellular localization of virus proteins with both single and multiple sites. , 2011, Journal of theoretical biology.

[17]  Chun-Nan Hsu,et al.  Boosting multiclass learning with repeating codes and weak detectors for protein subcellular localization , 2007, Bioinform..

[18]  K. Chou,et al.  iLoc-Euk: A Multi-Label Classifier for Predicting the Subcellular Localization of Singleplex and Multiplex Eukaryotic Proteins , 2011, PloS one.

[19]  Li Zhang,et al.  A novel representation for apoptosis protein subcellular localization prediction using support vector machine. , 2009, Journal of theoretical biology.

[20]  K. Chou,et al.  Analysis and Prediction of the Metabolic Stability of Proteins Based on Their Sequential Features, Subcellular Locations and Interaction Networks , 2010, PloS one.

[21]  K. Chou,et al.  iLoc-Gpos: a multi-layer classifier for predicting the subcellular localization of singleplex and multiplex Gram-positive bacterial proteins. , 2012, Protein and peptide letters.

[22]  K. Chou,et al.  iSNO-PseAAC: Predict Cysteine S-Nitrosylation Sites in Proteins by Incorporating Position Specific Amino Acid Propensity into Pseudo Amino Acid Composition , 2013, PloS one.

[23]  Jean-Philippe Thiran,et al.  Information theoretic combination of pattern classifiers , 2010, Pattern Recognit..

[24]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001 .

[25]  Tae-Sun Choi,et al.  Proximity based GPCRs prediction in transform domain. , 2008, Biochemical and biophysical research communications.

[26]  Asifullah Khan,et al.  CE-PLoc: An ensemble classifier for predicting protein subcellular locations by fusing different modes of pseudo amino acid composition , 2011, Comput. Biol. Chem..

[27]  Franz Kummert,et al.  An incremental approach to automated protein localisation , 2008, BMC Bioinformatics.

[28]  Kuo-Chen Chou,et al.  A Multi-Label Classifier for Predicting the Subcellular Localization of Gram-Negative Bacterial Proteins with Both Single and Multiple Sites , 2011, PloS one.

[29]  Hassan Mohabatkar,et al.  Prediction of cyclin proteins using Chou's pseudo amino acid composition. , 2010, Protein and peptide letters.

[30]  Loris Nanni,et al.  Novel features for automated cell phenotype image classification. , 2010, Advances in experimental medicine and biology.

[31]  Ganapati Panda,et al.  A novel feature representation method based on Chou's pseudo amino acid composition for protein structural class prediction , 2010, Comput. Biol. Chem..

[32]  Robert F Murphy,et al.  Automated Interpretation of Protein Subcellular Location Patterns: Implications for Early Cancer Detection and Assessment , 2004, Annals of the New York Academy of Sciences.

[33]  Robert F. Murphy,et al.  Towards a Systematics for Protein Subcellular Location: Quantitative Description of Protein Localization Patterns and Automated Analysis of Fluorescence Microscope Images , 2000, ISMB.

[34]  Meel Velliste,et al.  Automated interpretation of subcellular patterns in fluorescence microscope images for location proteomics , 2006, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[35]  K. Chou,et al.  Identification of Colorectal Cancer Related Genes with mRMR and Shortest Path in Protein-Protein Interaction Network , 2012, PloS one.

[36]  J. Nieto,et al.  Use of fuzzy clustering technique and matrices to classify amino acids and its impact to Chou's pseudo amino acid composition. , 2009, Journal of theoretical biology.

[37]  Jelena Kovacevic,et al.  Adaptive Multiresolution Techniques for Subcellular Protein Location Classification , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[38]  K. Chou,et al.  iLoc-Plant: a multi-label classifier for predicting the subcellular localization of plant proteins with both single and multiple sites. , 2011, Molecular bioSystems.

[39]  M. Esmaeili,et al.  Using the concept of Chou's pseudo amino acid composition for risk type prediction of human papillomaviruses. , 2010, Journal of theoretical biology.

[40]  Kuo-Chen Chou,et al.  Classification and Analysis of Regulatory Pathways Using Graph Property, Biochemical and Physicochemical Property, and Functional Property , 2011, PloS one.

[41]  K. Chou Some remarks on protein attribute prediction and pseudo amino acid composition , 2010, Journal of Theoretical Biology.

[42]  K. Chou,et al.  iLoc-Animal: a multi-label learning classifier for predicting subcellular localization of animal proteins. , 2013, Molecular bioSystems.

[43]  Kuo-Chen Chou,et al.  A sequence-based approach for predicting protein disordered regions. , 2013, Protein and peptide letters.

[44]  Nicholas A. Hamilton,et al.  Fast automated cell phenotype image classification , 2007, BMC Bioinformatics.

[45]  Wei Chen,et al.  iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition , 2013, Nucleic acids research.

[46]  Yaonan Wang,et al.  Texture classification using the support vector machines , 2003, Pattern Recognit..

[47]  Kuo-Chen Chou,et al.  Prediction of Protein Domain with mRMR Feature Selection and Analysis , 2012, PloS one.

[48]  Shao-Wu Zhang,et al.  Using the concept of Chou’s pseudo amino acid composition to predict protein subcellular localization: an approach by incorporating evolutionary information and von Neumann entropies , 2008, Amino Acids.

[49]  Jelena Kovacevic,et al.  A multiresolution approach to automated classification of protein subcellular location images , 2007, BMC Bioinformatics.

[50]  Suyu Mei,et al.  Multi-kernel transfer learning based on Chou's PseAAC formulation for protein submitochondria localization. , 2012, Journal of theoretical biology.

[51]  Robert F Murphy,et al.  Automated interpretation of subcellular patterns from immunofluorescence microscopy. , 2004, Journal of immunological methods.

[52]  Asifullah Khan,et al.  MemHyb: predicting membrane protein types by hybridizing SAAC and PSSM. , 2012, Journal of theoretical biology.

[53]  Mandana Behbahani,et al.  Predicting antibacterial peptides by the concept of Chou's pseudo-amino acid composition and machine learning methods. , 2012, Protein and peptide letters.

[54]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[55]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[56]  S. Gunn Support Vector Machines for Classification and Regression , 1998 .

[57]  L. Nanni,et al.  Selecting the Best Performing Rotation Invariant Patterns in Local Binary/Ternary Patterns , 2010, IPCV.

[58]  Tongliang Zhang,et al.  Using pseudo amino acid composition and binary-tree support vector machines to predict protein structural classes , 2007, Amino Acids.

[59]  K. Chou,et al.  Predicting Drug-Target Interaction Networks Based on Functional Groups and Biological Features , 2010, PloS one.

[60]  Asifullah Khan,et al.  Predicting membrane protein types by fusing composite protein sequence features into pseudo amino acid composition. , 2011, Journal of theoretical biology.