The Proteins API: accessing key integrated protein and genome information

Abstract The Proteins API provides searching and programmatic access to protein and associated genomics data such as curated protein sequence positional annotations from UniProtKB, as well as mapped variation and proteomics data from large scale data sources (LSS). Using the coordinates service, researchers are able to retrieve the genomic sequence coordinates for proteins in UniProtKB. This, the LSS genomics and proteomics data for UniProt proteins is programmatically only available through this service. A Swagger UI has been implemented to provide documentation, an interface for users, with little or no programming experience, to ‘talk’ to the services to quickly and easily formulate queries with the services and obtain dynamically generated source code for popular programming languages, such as Java, Perl, Python and Ruby. Search results are returned as standard JSON, XML or GFF data objects. The Proteins API is a scalable, reliable, fast, easy to use RESTful services that provides a broad protein information resource for users to ask questions based upon their field of expertise and allowing them to gain an integrated overview of protein annotations available to aid their knowledge gain on proteins in biological processes. The Proteins API is available at (http://www.ebi.ac.uk/proteins/api/doc).

[1]  James Y. Zou Analysis of protein-coding genetic variation in 60,706 humans , 2015, Nature.

[2]  C. Wijmenga,et al.  Molecular pathogenesis of Wilson and Menkes disease: correlation of mutations with molecular defects and disease phenotypes , 2007, Journal of Medical Genetics.

[3]  Johannes Griss,et al.  The Proteomics Identifications (PRIDE) database and associated tools: status in 2013 , 2012, Nucleic Acids Res..

[4]  Maria Jesus Martin,et al.  ProtVista: visualization of protein sequence annotations , 2017, Bioinform..

[5]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[6]  Mingming Jia,et al.  COSMIC: exploring the world's knowledge of somatic mutations in human cancer , 2014, Nucleic Acids Res..

[7]  Alessandro Vullo,et al.  Ensembl core software resources: storage and programmatic access for DNA sequence and genome annotation , 2016, bioRxiv.

[8]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[9]  Matthias Mann,et al.  Analysis of High Accuracy, Quantitative Proteomics Data in the MaxQB Database , 2012, Molecular & Cellular Proteomics.

[10]  Maria Jesus Martin,et al.  BioJS: an open source JavaScript framework for biological data visualization , 2013, Bioinform..

[11]  Ricardo Villamarín-Salomón,et al.  ClinVar: public archive of interpretations of clinically relevant variants , 2015, Nucleic Acids Res..

[12]  Angus I. Lamond,et al.  Global Subcellular Characterization of Protein Degradation Using Quantitative Proteomics , 2012, Molecular & Cellular Proteomics.

[13]  Eric W Deutsch,et al.  The state of the human proteome in 2012 as viewed through PeptideAtlas. , 2013, Journal of proteome research.

[14]  Roy T. Fielding,et al.  Principled design of the modern Web architecture , 2000, Proceedings of the 2000 International Conference on Software Engineering. ICSE 2000 the New Millennium.

[15]  Eric W. Deutsch,et al.  The PeptideAtlas project , 2005, Nucleic Acids Res..

[16]  R. Wilson,et al.  Modernizing Reference Genome Assemblies , 2011, PLoS biology.

[17]  G. von Heijne,et al.  Tissue-based map of the human proteome , 2015, Science.