Predicting cancerlectins by the optimal g-gap dipeptides

The cancerlectin plays a key role in the process of tumor cell differentiation. Thus, to fully understand the function of cancerlectin is significant because it sheds light on the future direction for the cancer therapy. However, the traditional wet-experimental methods were money- and time-consuming. It is highly desirable to develop an effective and efficient computational tool to identify cancerlectins. In this study, we developed a sequence-based method to discriminate between cancerlectins and non-cancerlectins. The analysis of variance (ANOVA) was used to choose the optimal feature set derived from the g-gap dipeptide composition. The jackknife cross-validated results showed that the proposed method achieved the accuracy of 75.19%, which is superior to other published methods. For the convenience of other researchers, an online web-server CaLecPred was established and can be freely accessed from the website http://lin.uestc.edu.cn/server/CalecPred. We believe that the CaLecPred is a powerful tool to study cancerlectins and to guide the related experimental validations.

[1]  Zhengwei Zhu,et al.  CD-HIT: accelerated for clustering the next-generation sequencing data , 2012, Bioinform..

[2]  Wei Chen,et al.  Identification of mycobacterial membrane proteins and their types using over-represented tripeptide compositions. , 2012, Journal of proteomics.

[3]  Wei Chen,et al.  Identification of voltage-gated potassium channel subfamilies from sequence information using support vector machine , 2012, Comput. Biol. Medicine.

[4]  Shinn-Ying Ho,et al.  SCMPSP: Prediction and characterization of photosynthetic proteins based on a scoring card method , 2015, BMC Bioinformatics.

[5]  Nathan Sharon,et al.  The Lectins: Properties, Functions and Applications in Biology and Medicine , 1986 .

[6]  Junjie Chen,et al.  Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences , 2015, Nucleic Acids Res..

[7]  Xiangxiang Zeng,et al.  nDNA-prot: identification of DNA-binding proteins based on unbalanced classification , 2014, BMC Bioinformatics.

[8]  R. Lotan,et al.  Lectins in Cancer Cells , 1988, Annals of the New York Academy of Sciences.

[9]  B. Liu,et al.  Identification of Real MicroRNA Precursors with a Pseudo Structure Status Composition Approach , 2015, PloS one.

[10]  Vijayakumar Saravanan,et al.  SCLAP: an adaptive boosting method for predicting subchloroplast localization of plant proteins. , 2013, Omics : a journal of integrative biology.

[11]  Dong Wang,et al.  Aberrant regulation of the LIN28A/LIN28B and let-7 loop in human malignant tumors and its effects on the hallmarks of cancer , 2015, Molecular Cancer.

[12]  Timothy R Billiar,et al.  Role of galectin-3 in breast cancer metastasis: involvement of nitric oxide. , 2002, The American journal of pathology.

[13]  Hao Lin,et al.  Predicting subcellular localization of mycobacterial proteins by using Chou's pseudo amino acid composition. , 2008, Protein and peptide letters.

[14]  H. Ding,et al.  Identification of mitochondrial proteins of malaria parasite using analysis of variance , 2014, Amino Acids.

[15]  B. Liu,et al.  PseDNA‐Pro: DNA‐Binding Protein Identification by Combining Chou’s PseAAC and Physicochemical Distance Transformation , 2015, Molecular informatics.

[16]  Joan Palou,et al.  Galectin-3 expression is associated with bladder cancer progression and clinical outcome , 2010, Tumor Biology.

[17]  Ha X. Dang,et al.  Allerdictor: fast allergen prediction using text classification techniques , 2014, Bioinform..

[18]  A. Surolia,et al.  Subunit assembly of plant lectins. , 2007, Current opinion in structural biology.

[19]  Wei Chen,et al.  Prediction of thermophilic proteins using feature selection technique. , 2011, Journal of microbiological methods.

[20]  Hui Ding,et al.  AcalPred: A Sequence-Based Tool for Discriminating between Acidic and Alkaline Enzymes , 2013, PloS one.

[21]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Gajendra PS Raghava,et al.  Analysis and prediction of cancerlectins using evolutionary and domain information , 2011, BMC Research Notes.

[23]  Bakhtiar Affendi Rosdi,et al.  Prediction of Antimicrobial Peptides Based on Sequence Alignment and Support Vector Machine-Pairwise Algorithm Utilizing LZ-Complexity , 2015, BioMed research international.

[24]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[25]  Jian Huang,et al.  Prediction of Golgi-resident protein types by using feature selection technique , 2013 .

[26]  Xiaolong Wang,et al.  Combining evolutionary information extracted from frequency profiles with sequence-based kernels for protein remote homology detection , 2013, Bioinform..

[27]  N. Sharon,et al.  Lectins: Carbohydrate-Specific Proteins That Mediate Cellular Recognition. , 1998, Chemical reviews.

[28]  S. Choi,et al.  Mistletoe lectin induces apoptosis and telomerase inhibition in human A253 cancer cells through dephosphorylation of akt , 2004, Archives of pharmacal research.

[29]  Nagasuma R. Chandra,et al.  CancerLectinDB: a database of lectins relevant to cancer , 2008, Glycoconjugate Journal.

[30]  Gabriel A. Rabinovich,et al.  Galectins as modulators of tumour progression , 2005, Nature Reviews Cancer.

[31]  Wei Chen,et al.  iNuc-PseKNC: a sequence-based predictor for predicting nucleosome positioning in genomes with pseudo k-tuple nucleotide composition , 2014, Bioinform..

[32]  E. D. de Mejia,et al.  Lectins as Bioactive Plant Proteins: A Potential in Cancer Treatment , 2005, Critical reviews in food science and nutrition.

[33]  Xia Li,et al.  Network-based survival-associated module biomarker and its crosstalk with cell death genes in ovarian cancer , 2015, Scientific Reports.

[34]  Vijay Tripathi,et al.  Discriminating lysosomal membrane protein types using dynamic neural network , 2014, Journal of biomolecular structure & dynamics.

[35]  K. Chou Some remarks on protein attribute prediction and pseudo amino acid composition , 2010, Journal of Theoretical Biology.

[36]  Pierre-Antoine Gourraud,et al.  Galectin-1 is a powerful marker to distinguish chondroblastic osteosarcoma and conventional chondrosarcoma. , 2010, Human pathology.

[37]  D. Dominguez,et al.  Proteomics: Clinical Applications , 2007, American Society for Clinical Laboratory Science.

[38]  Wei Chen,et al.  Using Over-Represented Tetrapeptides to Predict Protein Submitochondria Locations , 2013, Acta Biotheoretica.

[39]  Hao Lin The modified Mahalanobis Discriminant for predicting outer membrane proteins by using Chou's pseudo amino acid composition. , 2008, Journal of theoretical biology.

[40]  Yue Gao,et al.  Improved and promising identification of human MicroRNAs by incorporating a high-quality negative set , 2014, TCBB.

[41]  Xiaolong Wang,et al.  iMiRNA-PseDPC: microRNA precursor identification with a pseudo distance-pair composition approach , 2016, Journal of biomolecular structure & dynamics.

[42]  Wei Chen,et al.  Predicting the subcellular localization of mycobacterial proteins by incorporating the optimal tripeptides into the general form of pseudo amino acid composition. , 2015, Molecular bioSystems.

[43]  K. Chou,et al.  iCTX-Type: A Sequence-Based Predictor for Identifying the Types of Conotoxins in Targeting Ion Channels , 2014, BioMed research international.

[44]  Q. Zou,et al.  Prediction of MicroRNA-Disease Associations Based on Social Network Analysis Methods , 2015, BioMed Research International.

[45]  K. Chou,et al.  Using Functional Domain Composition and Support Vector Machines for Prediction of Protein Subcellular Location* , 2002, The Journal of Biological Chemistry.

[46]  Wei Chen,et al.  iPro54-PseKNC: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition , 2014, Nucleic acids research.

[47]  Wei Chen,et al.  Identification of bacteriophage virion proteins by the ANOVA feature selection and analysis. , 2014, Molecular bioSystems.

[48]  U Schumacher,et al.  Helix pomatia agglutinin binding is a useful prognostic indicator in colorectal carcinoma , 1994, Cancer.

[49]  Hui Ding,et al.  Predicting ion channels and their types by the dipeptide mode of pseudo amino acid composition. , 2011, Journal of theoretical biology.

[50]  Junjie Chen,et al.  Application of learning to rank to protein remote homology detection , 2015, Bioinform..

[51]  P M Gaylarde,et al.  Lectins , 1985 .

[52]  Dong Wang,et al.  miRNA–mRNA Interaction Network in Non-small Cell Lung Cancer , 2015, Interdisciplinary Sciences: Computational Life Sciences.

[53]  Hao Lin,et al.  Prediction of cell wall lytic enzymes using Chou's amphiphilic pseudo amino acid composition. , 2009, Protein and peptide letters.

[54]  G. Vasta,et al.  Roles of galectins in infection , 2009, Nature Reviews Microbiology.

[55]  Wayne A Hendrickson,et al.  What is 'current opinion' in structural biology? , 2011, Current opinion in structural biology.

[56]  Lingling Hu,et al.  miRClassify: An advanced web server for miRNA family classification and annotation , 2014, Comput. Biol. Medicine.

[57]  B. Liu,et al.  iDNA-Prot|dis: Identifying DNA-Binding Proteins by Incorporating Amino Acid Distance-Pairs and Reduced Alphabet Profile into the General Pseudo Amino Acid Composition , 2014, PloS one.

[58]  Xiaolong Wang,et al.  miRNA-dis: microRNA precursor identification based on distance structure status pairs. , 2015, Molecular bioSystems.

[59]  Chen Lin,et al.  LibD3C: Ensemble classifiers with a clustering and dynamic selection strategy , 2014, Neurocomputing.

[60]  Wei Chen,et al.  Identifying the Subfamilies of Voltage-Gated Potassium Channels Using Feature Selection Technique , 2014, International journal of molecular sciences.

[61]  N. Sharon,et al.  Lectins as cell recognition molecules. , 1989, Science.

[62]  Hao Lin,et al.  Eukaryotic and prokaryotic promoter prediction using hybrid approach , 2011, Theory in Biosciences.

[63]  Shingo Kato,et al.  Increased expression of galectin-3 in primary gastric cancer and the metastatic lymph nodes. , 2002, Oncology reports.

[64]  Tong Wang,et al.  Using the nonlinear dimensionality reduction method for the prediction of subcellular localization of Gram-negative bacterial proteins , 2009, Molecular Diversity.

[65]  B. Liu,et al.  Protein remote homology detection by combining Chou’s distance-pair pseudo amino acid composition and principal component analysis , 2015, Molecular Genetics and Genomics.