Building and analysis of protein-protein interactions related to diabetes mellitus using support vector machine, biomedical text mining and network analysis

In order to understand the molecular mechanism underlying any disease, knowledge about the interacting proteins in the disease pathway is essential. The number of revealed protein-protein interactions (PPI) is still very limited compared to the available protein sequences of different organisms. Experiment based high-throughput technologies though provide some data about these interactions, those are often fairly noisy. Computational techniques for predicting protein-protein interactions therefore assume significance. 1296 binary fingerprints that encode a combination of structural and geometric properties were developed using the crystallographic data of 15,000 protein complexes in the pdb server. In a case study, these fingerprints were created for proteins implicated in the Type 2 diabetes mellitus disease. The fingerprints were input into a SVM based model for discriminating disease proteins from non disease proteins yielding a classification accuracy of 78.2% (AUC value of 0.78) on an external data set composed of proteins retrieved via text mining of diabetes related literature. A PPI network was constructed and analysed to explore new disease targets. The integrated approach exemplified here has a potential for identifying disease related proteins, functional annotation and other proteomics studies.

[1]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[2]  D. Loiselle,et al.  Mitochondrial inefficiencies and anoxic ATP hydrolysis capacities in diabetic rat heart. , 2014, American journal of physiology. Cell physiology.

[3]  M. Jensen,et al.  Weight and type 2 diabetes after bariatric surgery: systematic review and meta-analysis. , 2009, The American journal of medicine.

[4]  Baldomero Oliva,et al.  Knowledge-based modeling of peptides at protein interfaces: PiPreD , 2015, Bioinform..

[5]  A. Barabasi,et al.  Network medicine : a network-based approach to human disease , 2010 .

[6]  R. DeFronzo PATHOGENESIS OF TYPE 2 DIABETES: METABOLIC AND MOLECULAR IMPLICATIONS FOR IDENTIFYING DIABETES GENES , 1997 .

[7]  M. Cox,et al.  Inhibition of recA protein promoted ATP hydrolysis. 1. ATP gamma S and ADP are antagonistic inhibitors. , 1990, Biochemistry.

[8]  Maricel G. Kann,et al.  Protein interactions and disease: computational approaches to uncover the etiology of diseases , 2007, Briefings Bioinform..

[9]  Christian von Mering,et al.  STRING: a database of predicted functional associations between proteins , 2003, Nucleic Acids Res..

[10]  Clara Pizzuti,et al.  Algorithms and tools for protein-protein interaction networks clustering, with a special focus on population-based stochastic methods , 2014, Bioinform..

[11]  Robert C. Edgar,et al.  Multiple sequence alignment. , 2006, Current opinion in structural biology.

[12]  Jacek M. Zurada,et al.  Normalized Mutual Information Feature Selection , 2009, IEEE Transactions on Neural Networks.

[13]  S. Fields,et al.  Protein-protein interactions: methods for detection and analysis , 1995, Microbiological reviews.

[14]  Ioannis Xenarios,et al.  DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions , 2002, Nucleic Acids Res..

[15]  Frances M. G. Pearl,et al.  The CATH Domain Structure Database and related resources Gene3D and DHS provide comprehensive domain family information for genome analysis , 2004, Nucleic Acids Res..

[16]  J. L. Messina,et al.  Insulin Inhibits Growth Hormone Signaling via the Growth Hormone Receptor/JAK2/STAT5B Pathway* , 1999, The Journal of Biological Chemistry.

[17]  J. Ermolieff,et al.  Protein tyrosine phosphatase 1B inhibitors for diabetes , 2002, Nature Reviews Drug Discovery.

[18]  T. Adrian,et al.  Serum trypsin concentration and pancreatic trypsin secretion in insulin-dependent diabetes mellitus. , 1980, Clinica chimica acta; international journal of clinical chemistry.

[19]  Rafael C. Jimenez,et al.  The IntAct molecular interaction database in 2012 , 2011, Nucleic Acids Res..

[20]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[21]  Hao Zhu,et al.  A novel one-class SVM based negative data sampling method for reconstructing proteome-wide HTLV-human protein interaction networks , 2015, Scientific Reports.

[22]  Markus Hofmann,et al.  RapidMiner: Data Mining Use Cases and Business Analytics Applications , 2013 .

[23]  David G. Karlin,et al.  Detecting Remote Sequence Homology in Disordered Proteins: Discovery of Conserved Motifs in the N-Termini of Mononegavirales phosphoproteins , 2012, PloS one.

[24]  R. Sharan,et al.  Protein networks in disease. , 2008, Genome research.

[25]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[26]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[27]  Renu Vyas,et al.  MegaMiner: A Tool for Lead Identification Through Text Mining Using Chemoinformatics Tools and Cloud Computing Environment. , 2015, Combinatorial chemistry & high throughput screening.

[28]  S. Fowler,et al.  Reduction in the incidence of type 2 diabetes with lifestyle intervention or metformin. , 2002 .

[29]  C. Kahn,et al.  Protein-protein interaction in insulin signaling and the molecular mechanisms of insulin resistance. , 1999, The Journal of clinical investigation.