Detecting De Novo Plasmodesmata Targeting Signals and Identifying PD Targeting Proteins

Subcellular localization plays important roles in protein’s functioning. In this paper, we developed a hidden Markov model to detect de novo signals in protein sequences that target at a particular cellular location: plasmodesmata. We also developed a support vector machine to classify plasmodesmata located proteins (PDLPs) in Arabidopsis, and devised a decision-tree approach to combine the SVM and HMM for better classification performance. The methods achieved high performance with ROC score 0.99 in cross-validation test on a set of 360 type I transmembrane proteins in Arabidopsis. The predicted PD targeting signals in one PDLP have been experimentally verified.

[1]  J. Selbig,et al.  SLocX: Predicting Subcellular Localization of Arabidopsis Proteins Leveraging Gene Expression Data , 2011, Front. Plant Sci..

[2]  L. Baum,et al.  An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[3]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[4]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[5]  Vijayakumar Saravanan,et al.  APSLAP: An Adaptive Boosting Technique for Predicting Subcellular Localization of Apoptosis Protein , 2013, Acta biotheoretica.

[6]  Gajendra P. S. Raghava,et al.  ESLpred: SVM-based method for subcellular localization of eukaryotic proteins using dipeptide composition and PSI-BLAST , 2004, Nucleic Acids Res..

[7]  Ole Winther,et al.  DeepLoc: prediction of protein subcellular localization using deep learning , 2017, Bioinform..

[8]  Charles Elkan,et al.  Fitting a Mixture Model By Expectation Maximization To Discover Motifs In Biopolymer , 1994, ISMB.

[9]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[10]  K. Chou,et al.  REVIEW : Recent advances in developing web-servers for predicting protein attributes , 2009 .

[11]  L. Baum,et al.  Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .

[12]  L. Baum,et al.  Growth transformations for functions on manifolds. , 1968 .

[13]  L. Baum,et al.  An inequality with applications to statistical estimation for probabilistic functions of Markov processes and to a model for ecology , 1967 .

[14]  Guang R. Gao,et al.  An improved hidden Markov model for transmembrane protein detection and topology prediction and its applications to complete genomes , 2005, Bioinform..

[15]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[16]  J. Heller,et al.  Viterbi Decoding for Satellite and Space Communication , 1971 .

[17]  Andrei L Lomize,et al.  TMDOCK: An Energy-Based Method for Modeling α-Helical Dimers in Membranes. , 2017, Journal of molecular biology.

[18]  Oliver Kohlbacher,et al.  MultiLoc2: integrating phylogeny and Gene Ontology terms improves subcellular protein localization prediction , 2009, BMC Bioinformatics.

[19]  Pierre Dönnes,et al.  Predicting Protein Subcellular Localization: Past, Present, and Future , 2004, Genomics, proteomics & bioinformatics.