BAMORF: A Novel Computational Method for Predicting the Extracellular Matrix Proteins

Extracellular matrix (ECM) proteins play a major role in the tissues of multicellular organisms. The ECM presents structural support for cells inside a tumor. Meanwhile, it also works homeostatically to mediate the interaction between cells. However, the current bioinformatics tools to predict the ECM proteins seem often fail. This paper introduces a method for predicting the ECM proteins from the protein sequence as well as the molecular characteristics. We report a novel hybrid animal migration optimization and random forest method to predict the ECM protein sequences adapting four various features design methods. Binary animal migration optimization (AMORF) is used to select a near-optimal subset of informative features that are most relevant for the classification. AMORF experiments on a data set, including 145 ECM and 3887 non-ECM proteins. Our algorithm performs 86.4700% accuracy, a sensitive of 84.9655%, a specificity of 86.5261%, a Matthew’s correlation coefficient of 0.3627, and an area under receiver operating characteristic of 0.877804. The results confirm that the proposed method is promising. From the results, we can summarize that it can choose small subsets of features and still increase the classification efficiency.

[1]  Xiangtao Li,et al.  BAMOKNN: A novel computational method for predicting the apoptosis protein locations , 2016, 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[2]  L. Sorokin The impact of the extracellular matrix on inflammation , 2010, Nature Reviews Immunology.

[3]  K. Chou,et al.  PseAAC: a flexible web server for generating various kinds of protein pseudo amino acid composition. , 2008, Analytical biochemistry.

[4]  M. Bissell,et al.  Of extracellular matrix, scaffolds, and signaling: tissue architecture regulates development, homeostasis, and cancer. , 2006, Annual review of cell and developmental biology.

[5]  Thomas Martinetz,et al.  EcmPred: prediction of extracellular matrix proteins based on random forest with maximum relevance minimum redundancy feature selection. , 2013, Journal of theoretical biology.

[6]  H. Vlamakis,et al.  Osmotic spreading of Bacillus subtilis biofilms driven by an extracellular matrix , 2012, Proceedings of the National Academy of Sciences.

[7]  Doheon Lee,et al.  Prediction of Extracellular Matrix Proteins Based on Distinctive Sequence and Domain Characteristics , 2010, J. Comput. Biol..

[8]  Chen Chu,et al.  Prediction and analysis of cell-penetrating peptides using pseudo-amino acid composition and random forest models , 2015, Amino Acids.

[9]  P. Suganthan,et al.  AFP-Pred: A random forest approach for predicting antifreeze proteins from sequence-derived properties. , 2011, Journal of theoretical biology.

[10]  W. Atchley,et al.  Solving the protein sequence metric problem. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Troels Z. Kristiansen,et al.  Biomarker Discovery from Pancreatic Cancer Secretome Using a Differential Proteomic Approach*S , 2006, Molecular & Cellular Proteomics.

[12]  Zhanchao Li,et al.  Using Chou's amphiphilic pseudo-amino acid composition and support vector machine for prediction of enzyme subfamily classes. , 2007, Journal of theoretical biology.

[13]  Tyson A. Clark,et al.  Aberrant alternative splicing and extracellular matrix gene expression in mouse models of myotonic dystrophy , 2010, Nature Structural &Molecular Biology.

[14]  Jing Huang,et al.  Support Vector Machines for Predicting Apoptosis Proteins Types , 2005, Acta biotheoretica.

[15]  Zhen-Hui Zhang,et al.  A novel method for apoptosis protein subcellular localization prediction combining encoding based on grouped weight and support vector machine , 2006, FEBS letters.

[16]  Steven A. Carr,et al.  The Matrisome: In Silico Definition and In Vivo Characterization by Proteomics of Normal and Tumor Extracellular Matrices , 2011, Molecular & Cellular Proteomics.

[17]  Nicolas Biais,et al.  Integrin-dependent force transmission to the extracellular matrix by α-actinin triggers adhesion maturation , 2013, Proceedings of the National Academy of Sciences.

[18]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[19]  P. Suganthan,et al.  SPRED: A machine learning approach for the identification of classical and non-classical secretory proteins in mammalian genomes. , 2010, Biochemical and biophysical research communications.

[20]  J. Turnbull,et al.  Extracellular matrix and cell signalling: the dynamic cooperation of integrin, proteoglycan and growth factor receptor. , 2011, The Journal of endocrinology.

[21]  Guo-Ping Zhou,et al.  Subcellular location prediction of apoptosis proteins , 2002, Proteins.

[22]  Minghao Yin,et al.  Animal migration optimization: an optimization algorithm inspired by animal migration behavior , 2014, Neural Computing and Applications.

[23]  K. Chou Some remarks on protein attribute prediction and pseudo amino acid composition , 2010, Journal of Theoretical Biology.

[24]  Yongsheng Ding,et al.  Using Chou's pseudo amino acid composition to predict subcellular localization of apoptosis proteins: An approach with immune genetic algorithm-based ensemble classifier , 2008, Pattern Recognit. Lett..

[25]  K. Chou,et al.  REVIEW : Recent advances in developing web-servers for predicting protein attributes , 2009 .

[26]  Hao Lin,et al.  Prediction of subcellular location of mycobacterial protein using feature selection techniques , 2010, Molecular Diversity.

[27]  Q. Pan,et al.  Using pseudo amino acid composition to predict protein subcellular location: approached with amino acid composition distribution , 2008, Amino Acids.

[28]  J. Petrik,et al.  Extracellular Matrix Proteins and Tumor Angiogenesis , 2010, Journal of oncology.

[29]  Mandana Behbahani,et al.  Using Chou’s Pseudo Amino Acid Composition and Machine LearningMethod to Predict the Antiviral Peptides , 2015 .

[30]  K. Chou,et al.  iDNA-Prot: Identification of DNA Binding Proteins Using Random Forest with Grey Model , 2011, PloS one.