ECMPride: prediction of human extracellular matrix proteins based on the ideal dataset using hybrid features with domain evidence

Extracellular matrix (ECM) proteins play an essential role in various biological processes in multicellular organisms, and their abnormal regulation can lead to many diseases. For large-scale ECM protein identification, especially through proteomic-based techniques, a theoretical reference database of ECM proteins is required. In this study, based on the experimentally verified ECM datasets and by the integration of protein domain features and a machine learning model, we developed ECMPride, a flexible and scalable tool for predicting ECM proteins. ECMPride achieved excellent performance in predicting ECM proteins, with appropriate balanced accuracy and sensitivity, and the performance of ECMPride was shown to be superior to the previously developed tool. A new theoretical dataset of human ECM components was also established by applying ECMPride to all human entries in the SwissProt database, containing a significant number of putative ECM proteins as well as the abundant biological annotations. This dataset might serve as a valuable reference resource for ECM protein identification.

[1]  Thomas Martinetz,et al.  EcmPred: prediction of extracellular matrix proteins based on random forest with maximum relevance minimum redundancy feature selection. , 2013, Journal of theoretical biology.

[2]  A. E. del Río Hernández,et al.  Role of Extracellular Matrix in Development and Cancer Progression , 2018, International journal of molecular sciences.

[3]  Doheon Lee,et al.  Prediction of Extracellular Matrix Proteins Based on Distinctive Sequence and Domain Characteristics , 2010, J. Comput. Biol..

[4]  Minoru Kanehisa,et al.  AAindex: amino acid index database, progress report 2008 , 2007, Nucleic Acids Res..

[5]  Morten Nielsen,et al.  NetMHC-3.0: accurate web accessible predictions of human, mouse and monkey MHC class I affinities for peptides of length 8–11 , 2008, Nucleic Acids Res..

[6]  Erich Bornberg-Bauer,et al.  Dynamics and adaptive benefits of modular protein evolution. , 2013, Current opinion in structural biology.

[7]  S. Teichmann,et al.  Multi-domain protein families and domain pairs: comparison with known structures and a random model of domain recombination , 2004, Journal of Structural and Functional Genomics.

[8]  J. Malmström,et al.  Quantitative proteomic characterization of the lung extracellular matrix in chronic obstructive pulmonary disease and idiopathic pulmonary fibrosis. , 2018, Journal of proteomics.

[9]  Saeed Ahmad,et al.  Improving prediction of extracellular matrix proteins using evolutionary information via a grey system model and asymmetric under-sampling technique , 2018 .

[10]  P. Weinreb,et al.  Fibronectin-guided migration of carcinoma collectives , 2017, Nature Communications.

[11]  Asifullah Khan,et al.  MemHyb: predicting membrane protein types by hybridizing SAAC and PSSM. , 2012, Journal of theoretical biology.

[12]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[13]  Steven A. Carr,et al.  The Matrisome: In Silico Definition and In Vivo Characterization by Proteomics of Normal and Tumor Extracellular Matrices , 2011, Molecular & Cellular Proteomics.

[14]  Yan Li,et al.  A protein structural classes prediction method based on PSI-BLAST profile. , 2014, Journal of theoretical biology.

[15]  Richard O. Hynes,et al.  The Extracellular Matrix: Not Just Pretty Fibrils , 2009, Science.

[16]  R. Hynes,et al.  Characterization of the Extracellular Matrix of Normal and Diseased Tissues Using Proteomics. , 2017, Journal of proteome research.

[17]  Z. Werb,et al.  Remodelling the extracellular matrix in development and disease , 2014, Nature Reviews Molecular Cell Biology.

[18]  The Gene Ontology Consortium Expansion of the Gene Ontology knowledgebase and resources , 2016, Nucleic Acids Res..

[19]  K. Chou,et al.  iLoc-Animal: a multi-label learning classifier for predicting subcellular localization of animal proteins. , 2013, Molecular bioSystems.

[20]  Sylvie Ricard-Blum,et al.  MatrixDB: integration of new data with a focus on glycosaminoglycan interactions , 2018, Nucleic Acids Res..

[21]  A. Theocharis,et al.  Extracellular matrix structure. , 2016, Advanced drug delivery reviews.

[22]  Wei Chen,et al.  Predicting peroxidase subcellular location by hybridizing different descriptors of Chou' pseudo amino acid patterns. , 2014, Analytical biochemistry.

[23]  Michael J. Randles,et al.  Global analysis reveals the complexity of the human glomerular extracellular matrix. , 2014, Journal of the American Society of Nephrology : JASN.

[24]  Maqsood Hayat,et al.  Machine learning approaches for discrimination of Extracellular Matrix proteins using hybrid feature space. , 2016, Journal of theoretical biology.

[25]  A. Shuttleworth,et al.  Defining Elastic Fiber Interactions by Molecular Fishing , 2009, Molecular & Cellular Proteomics.

[26]  Sylvie Ricard-Blum,et al.  Toward a systems level view of the ECM and related proteins: A framework for the systematic definition and analysis of biological systems , 2012, Proteins.

[27]  The Gene Ontology Consortium,et al.  Expansion of the Gene Ontology knowledgebase and resources , 2016, Nucleic Acids Res..

[28]  J. Erler,et al.  ISDoT: in situ decellularization of tissues for high-resolution imaging and proteomic analysis of native extracellular matrix , 2017, Nature Medicine.

[29]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[31]  Xiaowei Zhao,et al.  PECM: prediction of extracellular matrix proteins using the concept of Chou's pseudo amino acid composition. , 2014, Journal of theoretical biology.

[32]  P. Brophy,et al.  Vesicoureteral reflux and the extracellular matrix connection , 2017, Pediatric Nephrology.

[33]  Yuan Yu,et al.  SubMito-PSPCP: Predicting Protein Submitochondrial Locations by Hybridizing Positional Specific Physicochemical Properties with Pseudoamino Acid Compositions , 2013, BioMed research international.

[34]  Jean-Philippe Vert,et al.  A novel representation of protein sequences for prediction of subcellular location using support vector machines , 2005, Protein science : a publication of the Protein Society.

[35]  Kuo-Chen Chou,et al.  Large‐scale plant protein subcellular location prediction , 2007, Journal of cellular biochemistry.

[36]  Kuo-Bin Li,et al.  Predicting membrane protein types by incorporating protein topology, domains, signal peptides, and physicochemical properties into the general form of Chou's pseudo amino acid composition. , 2013, Journal of theoretical biology.

[37]  Runtao Yang,et al.  An Ensemble Method with Hybrid Features to Identify Extracellular Matrix Proteins , 2015, PloS one.

[38]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[39]  Silvio C. E. Tosatto,et al.  InterPro in 2019: improving coverage, classification and access to protein sequence annotations , 2018, Nucleic Acids Res..

[40]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[41]  S. Carr,et al.  The extracellular matrix: Tools and insights for the "omics" era. , 2015, Matrix biology : journal of the International Society for Matrix Biology.

[42]  S. Ichinose,et al.  Stem cell competition orchestrates skin homeostasis and ageing , 2019, Nature.

[43]  Damian Szklarczyk,et al.  STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets , 2018, Nucleic Acids Res..

[44]  R. Boot-Handford,et al.  Genetic diseases of connective tissues: cellular and extracellular effects of ECM mutations , 2009, Nature Reviews Genetics.

[45]  Shivakumar Keerthikumar,et al.  ExoCarta: A Web-Based Compendium of Exosomal Cargo. , 2016, Journal of molecular biology.

[46]  Shouwei Zhang,et al.  BAMORF: A Novel Computational Method for Predicting the Extracellular Matrix Proteins , 2017, IEEE Access.