SCMPSP: Prediction and characterization of photosynthetic proteins based on a scoring card method

BackgroundPhotosynthetic proteins (PSPs) greatly differ in their structure and function as they are involved in numerous subprocesses that take place inside an organelle called a chloroplast. Few studies predict PSPs from sequences due to their high variety of sequences and structues. This work aims to predict and characterize PSPs by establishing the datasets of PSP and non-PSP sequences and developing prediction methods.ResultsA novel bioinformatics method of predicting and characterizing PSPs based on scoring card method (SCMPSP) was used. First, a dataset consisting of 649 PSPs was established by using a Gene Ontology term GO:0015979 and 649 non-PSPs from the SwissProt database with sequence identity <= 25%.- Several prediction methods are presented based on support vector machine (SVM), decision tree J48, Bayes, BLAST, and SCM. The SVM method using dipeptide features-performed well and yielded - a test accuracy of 72.31%. The SCMPSP method uses the estimated propensity scores of 400 dipeptides - as PSPs and has a test accuracy of 71.54%, which is comparable to that of the SVM method. The derived propensity scores of 20 amino acids were further used to identify informative physicochemical properties for characterizing PSPs. The analytical results reveal the following four characteristics of PSPs: 1) PSPs favour hydrophobic side chain amino acids; 2) PSPs are composed of the amino acids prone to form helices in membrane environments; 3) PSPs have low interaction with water; and 4) PSPs prefer to be composed of the amino acids of electron-reactive side chains.ConclusionsThe SCMPSP method not only estimates the propensity of a sequence to be PSPs, it also discovers characteristics that further improve understanding of PSPs. The SCMPSP source code and the datasets used in this study are available at http://iclab.life.nctu.edu.tw/SCMPSP/.

[1]  L. Tamm,et al.  Folding and assembly of β-barrel membrane proteins , 2004 .

[2]  Shuhei Yamada,et al.  Construction of Photosynthetic Antenna Complex Using Light-harvesting Polypeptide-α from Photosynthetic Bacteria, R. rubrum with Zinc Substituted Bacteriochlorophyll a , 2003 .

[3]  Minoru Kanehisa,et al.  AAindex: amino acid index database, progress report 2008 , 2007, Nucleic Acids Res..

[4]  P. Carrupt,et al.  Structural damage to proteins caused by free radicals: asessment, protection by antioxidants, and influence of protein binding. , 2001, Biochemical pharmacology.

[5]  W. W. Adams,et al.  Antioxidants in Photosynthesis and Human Nutrition , 2002, Science.

[6]  K. V. van Wijk,et al.  The Oligomeric Stromal Proteome of Arabidopsis thaliana Chloroplasts *S , 2006, Molecular & Cellular Proteomics.

[7]  Ian H. Witten,et al.  Data mining in bioinformatics using Weka , 2004, Bioinform..

[8]  Shinn-Ying Ho,et al.  SCMCRYS: Predicting Protein Crystallization Using an Ensemble Scoring Card Method with Estimating Propensity Scores of P-Collocated Amino Acid Pairs , 2013, PloS one.

[9]  A. Brunger,et al.  Statistical analysis of predicted transmembrane α-helices , 1998 .

[10]  M. Davies,et al.  The oxidative environment and protein damage. , 2005, Biochimica et biophysica acta.

[11]  A. Maritan,et al.  A knowledge‐based scale for amino acid membrane propensity , 2002, Proteins.

[12]  G. Friso,et al.  Proteomics of the Chloroplast: Systematic Identification and Targeting Analysis of Lumenal and Peripheral Thylakoid Proteins , 2000, Plant Cell.

[13]  A. Goldberg,et al.  Proteins damaged by oxygen radicals are rapidly degraded in extracts of red blood cells. , 1987, The Journal of biological chemistry.

[14]  Ruchi Verma,et al.  Identification and characterization of plastid-type proteins from sequence-attributed features using machine learning , 2013, BMC Bioinformatics.

[15]  O. Emanuelsson,et al.  Sorting Signals, N-Terminal Modifications and Abundance of the Chloroplast Proteome , 2008, PloS one.

[16]  K. Cline,et al.  Post-translational protein translocation into thylakoids by the Sec and DeltapH-dependent pathways. , 2001, Biochimica et biophysica acta.

[17]  S. Brunak,et al.  Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. , 2000, Journal of molecular biology.

[18]  Egbert J Boekema,et al.  Supramolecular organization of thylakoid membrane proteins in green plants. , 2005, Biochimica et biophysica acta.

[19]  S. Lewandowsky PLOS ONE 2013 , 2015 .

[20]  M. Ishikawa,et al.  Orthogenomics of photosynthetic organisms: bioinformatic and experimental analysis of chloroplast proteins of endosymbiont origin in Arabidopsis and their counterparts in Synechocystis. , 2009, Plant & cell physiology.

[21]  M. Nango,et al.  Immobilization of porphyrin derivatives with a defined distance and orientation onto a gold electrode using synthetic light-harvesting α-helix hydrophobic polypeptides. , 2010, Langmuir : the ACS journal of surfaces and colloids.

[22]  G. Heijne,et al.  ChloroP, a neural network‐based method for predicting chloroplast transit peptides and their cleavage sites , 1999, Protein science : a publication of the Protein Society.

[23]  K. Sjölander,et al.  The Arabidopsis thaliana Chloroplast Proteome Reveals Pathway Abundance and Novel Protein Functions , 2004, Current Biology.

[24]  C. Huber,et al.  Multidimensional proteomic analysis of photosynthetic membrane proteins by liquid extraction‐ultracentrifugation‐liquid chromatography‐mass spectrometry , 2004, Proteomics.

[25]  B. Haas,et al.  Proteome Map of the Chloroplast Lumen of Arabidopsis thaliana * , 2002, The Journal of Biological Chemistry.

[26]  Henrik Vibe Scheller,et al.  Structure, function and regulation of plant photosystem I. , 2007, Biochimica et biophysica acta.

[27]  M. Giardi,et al.  Photosynthetic proteins for technological applications. , 2005, Trends in biotechnology.

[28]  S. Allakhverdiev,et al.  Oxidative stress inhibits the repair of photodamage to the photosynthetic machinery , 2001, The EMBO journal.

[29]  G. Feher,et al.  The bacterial photosynthetic reaction center as a model for membrane proteins. , 1989, Annual review of biochemistry.

[30]  P. Jakeman,et al.  Antioxidative peptides: enzymatic production, in vitro and in vivo antioxidant activity and potential applications of milk-derived antioxidative peptides , 2012, Amino Acids.

[31]  E. Stadtman,et al.  Free radical-mediated oxidation of free amino acids and amino acid residues in proteins , 2003, Amino Acids.

[32]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[33]  F. Levi-Schaffer,et al.  Role of reactive oxygen species (ROS) in apoptosis induction , 2000, Apoptosis.

[34]  E. Piletska,et al.  Biotechnological Applications of Photosynthetic Proteins: Biochips, Biosensors and Biodevices , 2006 .

[35]  D. Mould,et al.  Development of hydrophobicity parameters to analyze proteins which bear post- or cotranslational modifications. , 1991, Analytical biochemistry.

[36]  S. Shigeoka,et al.  Understanding Oxidative Stress and Antioxidant Functions to Enhance Photosynthesis1 , 2010, Plant Physiology.

[37]  Gunnar Von Heijne,et al.  Sequence Analysis in Molecular Biology: Treasure Trove or Trivial Pursuit , 2012 .

[38]  Yanay Ofran,et al.  Assessing the relationship between conservation of function and conservation of sequence using photosynthetic proteins , 2012, Bioinform..

[39]  Rob Knight,et al.  UCHIME improves sensitivity and speed of chimera detection , 2011, Bioinform..

[40]  Karl Rihaczek,et al.  1. WHAT IS DATA MINING? , 2019, Data Mining for the Social Sciences.

[41]  Wen-Liang Chen,et al.  Prediction and analysis of protein solubility using a novel scoring card method with dipeptide composition , 2012, BMC Bioinformatics.

[42]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[43]  George M. Church,et al.  A new dawn for industrial photosynthesis , 2011, Photosynthesis Research.

[44]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[45]  A. Makino,et al.  Photosynthetic Research in Plant Science , 2009, Plant & cell physiology.

[46]  Hui-Ling Huang,et al.  Propensity Scores for Prediction and Characterization of Bioluminescent Proteins from Sequences , 2014, PloS one.

[47]  Shinn-Ying Ho,et al.  Intelligent evolutionary algorithms for large parameter optimization problems , 2004, IEEE Transactions on Evolutionary Computation.

[48]  K. Davies Protein damage and degradation by oxygen radicals. I. general aspects. , 1987, The Journal of biological chemistry.

[49]  T. Kieselbach,et al.  The Thylakoid Lumen of Chloroplasts , 1998, The Journal of Biological Chemistry.

[50]  Steven Salzberg,et al.  Programs for Machine Learning , 2004 .

[51]  Dario Leister,et al.  Chloroplast research in the genomic age. , 2003, Trends in genetics : TIG.

[52]  Rozenn Ravallec,et al.  Purification and identification of novel antioxidant peptides from enzymatic hydrolysates of sardinelle (Sardinella aurita) by-products proteins. , 2010 .

[53]  M. Kanehisa,et al.  A knowledge base for predicting protein localization sites in eukaryotic cells , 1992, Genomics.

[54]  L. Tamm,et al.  Folding and assembly of beta-barrel membrane proteins. , 2004, Biochimica et biophysica acta.

[55]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[56]  Branka Jeličić,et al.  Tethering of ferredoxin:NADP+ oxidoreductase to thylakoid membranes is mediated by novel chloroplast protein TROL. , 2009, The Plant journal : for cell and molecular biology.

[57]  P M Cullis,et al.  Affinities of amino acid side chains for solvent water. , 1981, Biochemistry.

[58]  Steven L. Salzberg,et al.  Book Review: C4.5: Programs for Machine Learning by J. Ross Quinlan. Morgan Kaufmann Publishers, Inc., 1993 , 1994, Machine Learning.