3DCONS-DB: A Database of Position-Specific Scoring Matrices in Protein Structures

Many studies have used position-specific scoring matrices (PSSM) profiles to characterize residues in protein structures and to predict a broad range of protein features. Moreover, PSSM profiles of Protein Data Bank (PDB) entries have been recalculated in many works for different purposes. Although the computational cost of calculating a single PSSM profile is affordable, many statistical studies or machine learning-based methods used thousands of profiles to achieve their goals, thereby leading to a substantial increase of the computational cost. In this work we present a new database compiling PSSM profiles for the proteins of the PDB. Currently, the database contains 333,532 protein chain profiles involving 123,135 different PDB entries.

[1]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[2]  Daniel J. Crichton,et al.  A framework for organizing cancer-related variations from existing databases, publications and NGS data using a High-performance Integrated Virtual Environment (HIVE) , 2014, Database J. Biol. Databases Curation.

[3]  Yulan He,et al.  PDNAsite: Identification of DNA-binding Site from Protein Sequence by Incorporating Spatial and Sequence Context , 2016, Scientific Reports.

[4]  Marcin J. Skwark,et al.  Improved Contact Predictions Using the Recognition of Protein Like Contact Patterns , 2014, PLoS Comput. Biol..

[5]  Alexander S. Rose,et al.  NGL Viewer: a web application for molecular visualization , 2015, Nucleic Acids Res..

[6]  Robert D. Finn,et al.  The Pfam protein families database: towards a more sustainable future , 2015, Nucleic Acids Res..

[7]  J M Carazo,et al.  3DBIONOTES: A unified, enriched and interactive view of macromolecular information. , 2016, Journal of structural biology.

[8]  Bin Zhang,et al.  PhosphoSitePlus, 2014: mutations, PTMs and recalibrations , 2014, Nucleic Acids Res..

[9]  Khader Shameer,et al.  3PFDB - A database of Best Representative PSSM Profiles (BRPs) of Protein Families generated using a novel data mining approach , 2009, BioData Mining.

[10]  Erin Beck,et al.  TIGRFAMs and Genome Properties in 2013 , 2012, Nucleic Acids Res..

[11]  Alexey G. Murzin,et al.  SCOP2 prototype: a new approach to protein structure mining , 2014, Nucleic Acids Res..

[12]  H. Dyson,et al.  Intrinsically disordered proteins in cellular signalling and regulation , 2014, Nature Reviews Molecular Cell Biology.

[13]  James G. Lyons,et al.  SPIDER2: A Package to Predict Secondary Structure, Accessible Surface Area, and Main-Chain Torsional Angles by Deep Neural Networks. , 2017, Methods in molecular biology.

[14]  Peer Bork,et al.  SMART: recent updates, new developments and status in 2015 , 2014, Nucleic Acids Res..

[15]  Narmada Thanki,et al.  CDD: NCBI's conserved domain database , 2014, Nucleic Acids Res..

[16]  Toby J. Gibson,et al.  ELM 2016—data update and new functionality of the eukaryotic linear motif resource , 2015, Nucleic Acids Res..

[17]  Jijun Tang,et al.  Local-DPP: An improved DNA-binding protein prediction method by exploring local evolutionary information , 2017, Inf. Sci..

[18]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[19]  Haruki Nakamura,et al.  The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data , 2006, Nucleic Acids Res..

[20]  Jian Peng,et al.  Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields , 2015, Scientific Reports.

[21]  David A. Lee,et al.  CATH: comprehensive structural and functional annotations for genome sequences , 2014, Nucleic Acids Res..

[22]  Zheng Yuan,et al.  Better prediction of protein contact number using a support vector regression analysis of amino acid sequence , 2005, BMC Bioinformatics.

[23]  Irina S. Moreira,et al.  A Machine Learning Approach for Hot-Spot Detection at Protein-Protein Interfaces , 2016, International journal of molecular sciences.

[24]  Oruganty Krishnadev,et al.  MulPSSM: a database of multiple position-specific scoring matrices of protein domain families , 2005, Nucleic Acids Res..

[25]  Darren A. Natale,et al.  The COG database: an updated version includes eukaryotes , 2003, BMC Bioinformatics.

[26]  José María Carazo,et al.  3DBIONOTES v2.0: a web server for the automatic annotation of macromolecular structures , 2017, Bioinform..

[27]  R. M. Williamson Information theory analysis of the relationship between primary sequence structure and ligand recognition among a class of facilitated transporters. , 1995, Journal of theoretical biology.

[28]  Kengo Kinoshita,et al.  PrDOS: prediction of disordered protein regions from amino acid sequence , 2007, Nucleic Acids Res..

[29]  Hai Fang,et al.  The SUPERFAMILY 1.75 database in 2014: a doubling of data , 2014, Nucleic Acids Res..

[30]  Peter B. McGarvey,et al.  UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches , 2014, Bioinform..

[31]  Ian Sillitoe,et al.  CATH-Gene3D: Generation of the Resource and Its Use in Obtaining Structural and Functional Annotations for Protein Sequences. , 2017, Methods in molecular biology.