LECTINPred: web Server that Uses Complex Networks of Protein Structure for Prediction of Lectins with Potential Use as Cancer Biomarkers or in Parasite Vaccine Design

Lectins (Ls) play an important role in many diseases such as different types of cancer, parasitic infections and other diseases. Interestingly, the Protein Data Bank (PDB) contains +3000 protein 3D structures with unknown function. Thus, we can in principle, discover new Ls mining non‐annotated structures from PDB or other sources. However, there are no general models to predict new biologically relevant Ls based on 3D chemical structures. We used the MARCH‐INSIDE software to calculate the Markov‐Shannon 3D electrostatic entropy parameters for the complex networks of protein structure of 2200 different protein 3D structures, including 1200 Ls. We have performed a Linear Discriminant Analysis (LDA) using these parameters as inputs in order to seek a new Quantitative Structure‐Activity Relationship (QSAR) model, which is able to discriminate 3D structure of Ls from other proteins. We implemented this predictor in the web server named LECTINPred, freely available at http://bio‐aims.udc.es/LECTINPred.php. This web server showed the following goodness‐of‐fit statistics: Sensitivity=96.7 % (for Ls), Specificity=87.6 % (non‐active proteins), and Accuracy=92.5 % (for all proteins), considering altogether both the training and external prediction series. In mode 2, users can carry out an automatic retrieval of protein structures from PDB. We illustrated the use of this server, in operation mode 1, performing a data mining of PDB. We predicted Ls scores for +2000 proteins with unknown function and selected the top‐scored ones as possible lectins. In operation mode 2, LECTINPred can also upload 3D structural models generated with structure‐prediction tools like LOMETS or PHYRE2. The new Ls are expected to be of relevance as cancer biomarkers or useful in parasite vaccine design.

[1]  K. Chou Some remarks on protein attribute prediction and pseudo amino acid composition , 2010, Journal of Theoretical Biology.

[2]  H. Lei,et al.  Lectin of Concanavalin A as an anti-hepatoma therapeutic agent , 2009, Journal of Biomedical Science.

[3]  Eugenio Uriarte,et al.  Markovian Backbone Negentropies: Molecular descriptors for protein research. I. Predicting protein stability in Arc repressor mutants , 2004, Proteins.

[4]  K. Chou,et al.  REVIEW : Recent advances in developing web-servers for predicting protein attributes , 2009 .

[5]  K. Chou,et al.  iLoc-Plant: a multi-label classifier for predicting the subcellular localization of plant proteins with both single and multiple sites. , 2011, Molecular bioSystems.

[6]  Vladimir A. Ivanisenko,et al.  PDBSite: a database of the 3D structure of protein functional sites , 2004, Nucleic Acids Res..

[7]  Wei Chen,et al.  iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition , 2013, Nucleic acids research.

[8]  P. Suganthan,et al.  AFP-Pred: A random forest approach for predicting antifreeze proteins from sequence-derived properties. , 2011, Journal of theoretical biology.

[9]  J. Garcia-Vallejo,et al.  Endogenous ligands for C‐type lectin receptors: the true regulators of immune homeostasis , 2009, Immunological reviews.

[10]  J. Dorado,et al.  Complex network spectral moments for ATCUN motif DNA cleavage: first predictive study on proteins of human pathogen parasites. , 2009, Journal of proteome research.

[11]  Bairong Shen,et al.  Physicochemical feature-based classification of amino acid mutations. , 2007, Protein engineering, design & selection : PEDS.

[12]  Lourdes Santana,et al.  A QSAR model for in silico screening of MAO-A inhibitors. Prediction, synthesis, and biological assay of novel coumarins. , 2006, Journal of medicinal chemistry.

[13]  O. Genbačev,et al.  Lectin binding as a biological test in vitro for the prediction of functional activity of human spermatozoa. , 1993, Human reproduction.

[14]  R. Roy,et al.  A first QSAR model for galectin-3 glycomimetic inhibitors based on 3D docked structures. , 2006, Medicinal chemistry.

[15]  M. Sternberg,et al.  Protein structure prediction on the Web: a case study using the Phyre server , 2009, Nature Protocols.

[16]  Francisco Torrens,et al.  Atom- and Bond-Based 2D TOMOCOMD-CARDD Approach and Ligand-Based Virtual Screening for the Drug Discovery of New Tyrosinase Inhibitors , 2008, Journal of biomolecular screening.

[17]  L. G. Pérez-Montoto,et al.  3D entropy and moments prediction of enzyme classes and experimental-theoretic study of peptide fingerprints in Leishmania parasites. , 2009, Biochimica et biophysica acta.

[18]  Kuo-Chen Chou,et al.  Prediction of enzyme family classes. , 2003, Journal of proteome research.

[19]  P. Garred,et al.  MBL2, FCN1, FCN2 and FCN3-The genes behind the initiation of the lectin pathway of complement. , 2009, Molecular immunology.

[20]  Feng Luan,et al.  Multi-target drug discovery in anti-cancer therapy: fragment-based approach toward the design of potent and versatile anti-prostate cancer agents. , 2011, Bioorganic & medicinal chemistry.

[21]  K. Chou,et al.  iLoc-Virus: a multi-label learning classifier for identifying the subcellular localization of virus proteins with both single and multiple sites. , 2011, Journal of theoretical biology.

[22]  J. Dorado,et al.  Trypano-PPI: a web server for prediction of unique targets in trypanosome proteome by using electrostatic parameters of protein-protein interactions. , 2010, Journal of proteome research.

[23]  Humberto González-Díaz,et al.  Predicting stability of Arc repressor mutants with protein stochastic moments. , 2005, Bioorganic & medicinal chemistry.

[24]  K. Chou,et al.  iLoc-Euk: A Multi-Label Classifier for Predicting the Subcellular Localization of Singleplex and Multiplex Eukaryotic Proteins , 2011, PloS one.

[26]  K. Chou,et al.  P-selectin cell adhesion molecule in inflammation, thrombosis, cancer growth and metastasis. , 2004, Current medicinal chemistry.

[27]  Julie C. Mitchell,et al.  Charge and hydrophobicity patterning along the sequence predicts the folding mechanism and aggregation of proteins: a computational approach. , 2004, Journal of proteome research.

[28]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[29]  Enrique Molina Pérez,et al.  Design of novel antituberculosis compounds using graph-theoretical and substructural approaches , 2009, Molecular Diversity.

[30]  L. Stuart,et al.  Mannose‐binding lectin and innate immunity , 2009, Immunological reviews.

[31]  Hassan Mohabatkar,et al.  Prediction of cyclin proteins using Chou's pseudo amino acid composition. , 2010, Protein and peptide letters.

[32]  Humberto González Díaz,et al.  Computational chemistry comparison of stable/nonstable protein mutants classification models based on 3D and topological indices , 2007, J. Comput. Chem..

[33]  Yovani Marrero-Ponce,et al.  Non-stochastic and stochastic linear indices of the 'molecular pseudograph's atom adjacency matrix': application to 'in silico' studies for the rational discovery of new antimalarial compounds. , 2005, Bioorganic & medicinal chemistry.

[34]  Kuo-Chen Chou,et al.  Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes , 2005, Bioinform..

[35]  Humberto González-Díaz,et al.  Alignment-free prediction of polygalacturonases with pseudofolding topological indices: experimental isolation from Coffea arabica and prediction of a new sequence. , 2009, Journal of proteome research.

[36]  Humberto González-Díaz,et al.  Recognition of stable protein mutants with 3D stochastic average electrostatic potentials , 2005, FEBS letters.

[37]  Francisco Torrens,et al.  Protein quadratic indices of the "macromolecular pseudograph's alpha-carbon atom adjacency matrix". 1. Prediction of Arc repressor alanine-mutant's stability. , 2004, Molecules.

[38]  E. Uriarte,et al.  Stochastic‐based descriptors studying biopolymers biological properties: Extended MARCH‐INSIDE methodology describing antibacterial activity of lactoferricin derivatives , 2005, Biopolymers.

[39]  P. Dobson,et al.  Distinguishing enzyme structures from non-enzymes without alignments. , 2003, Journal of molecular biology.

[40]  H. Nishiyama,et al.  Lectin-reactive α-Fetoprotein (AFP-L3%) Curability and Prediction of Clinical Course after Treatment of Non-seminomatous Germ Cell Tumors , 2002 .

[41]  K. Chou,et al.  iLoc-Hum: using the accumulation-label scale to predict subcellular locations of human proteins with both single and multiple sites. , 2012, Molecular bioSystems.

[42]  Eugenio Uriarte,et al.  Stochastic-based descriptors studying peptides biological properties: modeling the bitter tasting threshold of dipeptides. , 2004, Bioorganic & medicinal chemistry.

[43]  Kuo-Chen Chou,et al.  Predicting eukaryotic protein subcellular location by fusing optimized evidence-theoretic K-Nearest Neighbor classifiers. , 2006, Journal of proteome research.

[44]  Gianni Podda,et al.  Prediction of enzyme classes from 3D structure: a general model and examples of experimental-theoretic scoring of peptide mass fingerprints of Leishmania proteins. , 2009, Journal of proteome research.

[45]  Alejandro Speck-Planche,et al.  Rational design of new agrochemical fungicides using substructural descriptors. , 2011, Pest management science.

[46]  K. Chou,et al.  Prediction of protein structural classes. , 1995, Critical reviews in biochemistry and molecular biology.

[47]  Yves Moreau,et al.  Genome-wide copy number profiling of single cells in S-phase reveals DNA-replication domains , 2013, Nucleic acids research.

[48]  Lourdes Santana,et al.  Proteomics, networks and connectivity indices , 2008, Proteomics.

[49]  K. Chou,et al.  Recent advances in QSAR and their applications in predicting the activities of chemical molecules, peptides and proteins for drug design. , 2008, Current protein & peptide science.

[50]  R. Yeh,et al.  Severe preeclampsia-related changes in gene expression at the maternal-fetal interface include sialic acid-binding immunoglobulin-like lectin-6 and pappalysin-2. , 2009, Endocrinology.

[51]  Alejandro Speck-Planche and M. Natalia D.S. Cordeiro Application of Bioinformatics for the Search of Novel Anti-Viral Therapies: Rational Design of Anti-Herpes Agents , 2011 .

[52]  Kuo-Chen Chou,et al.  Fragment‐based quantitative structure–activity relationship (FB‐QSAR) for fragment‐based drug design , 2009, J. Comput. Chem..

[53]  T. Kita,et al.  Roles of lectin-like oxidized LDL receptor-1 and its soluble forms in atherogenesis , 2001, Current opinion in lipidology.

[54]  K. Chou,et al.  Knowledge-based model building of the tertiary structures for lectin domains of the selectin family , 1996, Journal of protein chemistry.

[55]  Maykel Pérez González,et al.  TOPS-MODE versus DRAGON descriptors to predict permeability coefficients through low-density polyethylene , 2003, J. Comput. Aided Mol. Des..

[56]  Lourdes Santana,et al.  Quantitative structure-activity relationship and complex network approach to monoamine oxidase A and B inhibitors. , 2008, Journal of medicinal chemistry.

[57]  Humberto González-Díaz,et al.  Proteins Markovian 3D-QSAR with spherically-truncated average electrostatic potentials. , 2005, Bioorganic & medicinal chemistry.

[58]  Mahmud Tareq Hassan Khan,et al.  New tyrosinase inhibitors selected by atomic linear indices-based classification models. , 2006, Bioorganic & medicinal chemistry letters.

[59]  Humberto González-Díaz,et al.  3D-QSAR study for DNA cleavage proteins with a potential anti-tumor ATCUN-like motif. , 2006, Journal of inorganic biochemistry.

[60]  Humberto González Díaz,et al.  Computational chemistry study of 3D‐structure‐function relationships for enzymes based on Markov models for protein electrostatic, HINT, and van der Waals potentials , 2009, J. Comput. Chem..

[61]  M. Noguchi,et al.  Further analysis of predictive value of Helix pomatia lectin binding to primary breast cancer for axillary and internal mammary lymph node metastases. , 1993, British Journal of Cancer.

[62]  K. Chou,et al.  iAMP-2L: a two-level multi-label classifier for identifying antimicrobial peptides and their functional types. , 2013, Analytical biochemistry.

[63]  Sitao Wu,et al.  LOMETS: A local meta-threading-server for protein structure prediction , 2007, Nucleic acids research.

[64]  Humberto González-Díaz,et al.  Stochastic molecular descriptors for polymers. 3. Markov electrostatic moments as polymer 2D-folding descriptors: RNA–QSAR for mycobacterial promoters , 2005 .

[65]  Hassan Mohabatkar,et al.  Prediction of allergenic proteins by means of the concept of Chou's pseudo amino acid composition and a machine learning approach. , 2012, Medicinal chemistry (Shariqah (United Arab Emirates)).

[66]  F. Prado-Prado,et al.  Predicting antimicrobial drugs and targets with the MARCH-INSIDE approach. , 2008, Current topics in medicinal chemistry.

[67]  Asifullah Khan,et al.  Predicting membrane protein types by fusing composite protein sequence features into pseudo amino acid composition. , 2011, Journal of theoretical biology.

[68]  JoDean Nicolette I am. , 2004, Family medicine.

[69]  Kuo-Chen Chou,et al.  Prediction of G-protein-coupled receptor classes. , 2005, Journal of proteome research.

[70]  Kuo-Chen Chou,et al.  Heuristic molecular lipophilicity potential (HMLP): A 2D‐QSAR study to LADH of molecular family pyrazole and derivatives , 2005, J. Comput. Chem..

[71]  Kuo-Chen Chou,et al.  Signal-CF: a subsite-coupled and window-fusing approach for predicting signal peptides. , 2007, Biochemical and biophysical research communications.

[72]  Wei Chen,et al.  iNuc-PhysChem: A Sequence-Based Predictor for Identifying Nucleosomes via Physicochemical Properties , 2012, PloS one.

[73]  Kuo-Chen Chou,et al.  Nuc-PLoc: a new web-server for predicting protein subnuclear localization by fusing PseAA composition and PsePSSM. , 2007, Protein engineering, design & selection : PEDS.

[74]  Humberto González Díaz,et al.  Computational chemistry approach to protein kinase recognition using 3D stochastic van der Waals spectral moments , 2007, J. Comput. Chem..

[75]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[76]  J. Dorado,et al.  Plasmod-PPI: A web-server predicting complex biopolymer targets in plasmodium with entropy measures of protein–protein interactions , 2010 .

[77]  Kuo-Chen Chou,et al.  Large-scale predictions of gram-negative bacterial protein subcellular locations. , 2006, Journal of proteome research.

[78]  B. Moshiri,et al.  Prediction of protein submitochondria locations based on data fusion of various features of sequences. , 2011, Journal of theoretical biology.

[79]  Alejandro Speck-Planche,et al.  Current pharmaceutical design of antituberculosis drugs: future perspectives. , 2010, Current pharmaceutical design.

[80]  Lourdes Santana,et al.  A model for the recognition of protein kinases based on the entropy of 3D van der Waals interactions. , 2007, Journal of proteome research.

[81]  Kuo-Chen Chou,et al.  The convergence‐divergence duality in lectin domains of selectin family and its implications , 1995, FEBS letters.

[82]  K. Chou,et al.  Bioinformatical analysis of G-protein-coupled receptors. , 2002, Journal of proteome research.

[83]  Kuo-Chen Chou,et al.  A Multi-Label Classifier for Predicting the Subcellular Localization of Gram-Negative Bacterial Proteins with Both Single and Multiple Sites , 2011, PloS one.

[84]  Lourdes Santana,et al.  Medicinal chemistry and bioinformatics--current trends in drugs discovery with networks topological indices. , 2007, Current topics in medicinal chemistry.

[85]  K. Chou,et al.  Cell-PLoc: a package of Web servers for predicting subcellular localization of proteins in various organisms , 2008, Nature Protocols.

[86]  K. Chou,et al.  Prediction of the Tertiary Structure of the Complement Control Protein Module , 1997, Journal of protein chemistry.

[87]  Cristian R. Munteanu,et al.  MIND-BEST: Web server for drugs and target discovery; design, synthesis, and assay of MAO-B inhibitors and theoretical-experimental study of G3PDH protein from Trichomonas gallinae. , 2011, Journal of proteome research.

[88]  Humberto González Díaz,et al.  Markovian negentropies in bioinformatics. 1. A picture of footprints after the interaction of the HIV-1 -RNA packaging region with drugs , 2003, Bioinform..

[89]  S. Gringhuis,et al.  An evolutionary perspective on C‐type lectins in infection and immunity , 2012, Annals of the New York Academy of Sciences.

[90]  Francisco Torrens,et al.  Atom, atom-type, and total nonstochastic and stochastic quadratic fingerprints: a promising approach for modeling of antibacterial activity. , 2005, Bioorganic & medicinal chemistry.

[91]  K. Chou,et al.  PseAAC: a flexible web server for generating various kinds of protein pseudo amino acid composition. , 2008, Analytical biochemistry.

[92]  Kuo-Chen Chou,et al.  Investigation into adamantane-based M2 inhibitors with FB-QSAR. , 2009, Medicinal chemistry (Shariqah (United Arab Emirates)).

[93]  A. Balaban,et al.  Topological Indices and Related Descriptors in QSAR and QSPR , 2003 .

[94]  Alejandro Speck-Planche,et al.  QSAR model toward the rational design of new agrochemical fungicides with a defined resistance risk using substructural descriptors , 2011, Molecular Diversity.

[95]  S. Vilar,et al.  A network-QSAR model for prediction of genetic-component biomarkers in human colorectal cancer. , 2009, Journal of theoretical biology.

[96]  Enrique Molina Pérez,et al.  Designing novel antitrypanosomal agents from a mixed graph‐theoretical substructural approach , 2009, J. Comput. Chem..

[97]  S. Gringhuis,et al.  Signalling through C-type lectin receptors: shaping immune responses , 2009, Nature Reviews Immunology.

[98]  K. Chou,et al.  iSNO-PseAAC: Predict Cysteine S-Nitrosylation Sites in Proteins by Incorporating Position Specific Amino Acid Propensity into Pseudo Amino Acid Composition , 2013, PloS one.

[99]  Francisco Torrens,et al.  Dragon method for finding novel tyrosinase inhibitors: Biosilico identification and experimental in vitro assays. , 2007, European journal of medicinal chemistry.

[100]  Francisco Torrens,et al.  TOMOCOMD-CARDD descriptors-based virtual screening of tyrosinase inhibitors: evaluation of different classification model combinations using bond-based linear indices. , 2007, Bioorganic & medicinal chemistry.

[101]  Kuo-Chen Chou,et al.  Multiple field three dimensional quantitative structure–activity relationship (MF‐3D‐QSAR) , 2008, J. Comput. Chem..

[102]  Eugenio Uriarte,et al.  Alignment-free prediction of a drug-target complex network based on parameters of drug connectivity and protein sequence of receptors. , 2009, Molecular pharmaceutics.

[103]  Francisco Torrens,et al.  Topological Charge-Transfer Indices: From Small Molecules to Proteins , 2009 .