Fragger: a protein fragment picker for structural queries

Protein modeling and design activities often require querying the Protein Data Bank (PDB) with a structural fragment, possibly containing gaps. For some applications, it is preferable to work on a specific subset of the PDB or with unpublished structures. These requirements, along with specific user needs, motivated the creation of a new software to manage and query 3D protein fragments. Fragger is a protein fragment picker that allows protein fragment databases to be created and queried. All fragment lengths are supported and any set of PDB files can be used to create a database. Fragger can efficiently search a fragment database with a query fragment and a distance threshold. Matching fragments are ranked by distance to the query. The query fragment can have structural gaps and the allowed amino acid sequences matching a query can be constrained via a regular expression of one-letter amino acid codes. Fragger also incorporates a tool to compute the backbone RMSD of one versus many fragments in high throughput. Fragger should be useful for protein design, loop grafting and related structural bioinformatics tasks.

[1]  S. Wodak,et al.  Modelling the polypeptide backbone with 'spare parts' from known protein structures. , 1989, Protein engineering.

[2]  A. Lesk,et al.  Common features of the conformations of antigen‐binding loops in immunoglobulins and application to modeling loop conformations , 1992, Proteins.

[3]  Dimitris K. Agrafiotis,et al.  An Efficient Implementation of Distance-Based Diversity Measures Based on k-d Trees , 1999, J. Chem. Inf. Comput. Sci..

[4]  B. Steipe,et al.  A revised proof of the metric properties of optimally superimposed vector sets. , 2002, Acta crystallographica. Section A, Foundations of crystallography.

[5]  Dimitris K. Agrafiotis,et al.  Nearest Neighbor Search in General Metric Spaces Using a Tree Data Structure with a Simple Heuristic , 2003, J. Chem. Inf. Comput. Sci..

[6]  David Baker,et al.  Protein structure prediction and analysis using the Robetta server , 2004, Nucleic Acids Res..

[7]  R. Nussinov,et al.  In silico protein design by combinatorial assembly of protein building blocks , 2004, Protein science : a publication of the Protein Society.

[8]  D. Theobald short communications Acta Crystallographica Section A Foundations of , 2005 .

[9]  L. Kavraki,et al.  Modeling protein conformational ensembles: From missing loops to equilibrium fluctuations , 2006, Proteins.

[10]  Pierre Baldi,et al.  Bounds and Algorithms for Fast Exact Searches of Chemical Fingerprints in Linear and Sublinear Time , 2007, J. Chem. Inf. Model..

[11]  Pierre Baldi,et al.  Speeding Up Chemical Database Searches Using a Proximity Filter Based on the Logical Exclusive OR , 2008, J. Chem. Inf. Model..

[12]  Philippe Cuniasse,et al.  RASMOT-3D PRO: a 3D motif search webserver , 2009, Nucleic Acids Res..

[13]  G. Sheldrick,et al.  Crystallographic ab initio protein structure solution below atomic resolution , 2009, Nature Methods.

[14]  Michael Levitt,et al.  Protein segment finder: an online search engine for segment motifs in the PDB , 2008, Nucleic Acids Res..

[15]  Inbal Budowski-Tal,et al.  FragBag, an accurate representation of protein structure, retrieves structural neighbors from the entire PDB quickly and accurately , 2010, Proceedings of the National Academy of Sciences.

[16]  Chaok Seok,et al.  Protein loop modeling by using fragment assembly and analytical loop closure , 2010, Proteins.

[17]  Randy J. Read,et al.  Acta Crystallographica Section D Biological , 2003 .

[18]  Tao Jiang,et al.  Accelerated similarity searching and clustering of large compound sets by geometric embedding and locality sensitive hashing , 2010, Bioinform..

[19]  Yang Zhang,et al.  Atomic-level protein structure refinement using fragment-guided molecular dynamics conformation sampling. , 2011, Structure.

[20]  François Stricher,et al.  BriX: a database of protein building blocks for structural analysis, modeling and design , 2010, Nucleic Acids Res..

[21]  Jens Meiler,et al.  ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. , 2011, Methods in enzymology.

[22]  Daniel W. Kulp,et al.  Generalized Fragment Picking in Rosetta: Design, Protocols and Applications , 2011, PloS one.

[23]  Fragment Finder 2.0: a computing server to identify structurally similar fragments , 2012 .

[24]  Yang Zhang,et al.  Ab initio protein structure assembly using continuous structure fragments and optimized knowledge‐based force field , 2012, Proteins.

[25]  R. Di Cosmo,et al.  A "Minimal Disruption" Skeleton Experiment: Seamless Map & Reduce Embedding in OCaml , 2012, ICCS.

[26]  Arun Siddharth Konagurthu,et al.  Super: a web server to rapidly screen superposable oligopeptide fragments from the protein data bank , 2012, Nucleic Acids Res..

[27]  Kam Y. J. Zhang,et al.  Error-estimation-guided rebuilding of de novo models increases the success rate of ab initio phasing. , 2012, Acta crystallographica. Section D, Biological crystallography.

[28]  Yong Zhou,et al.  Durandal: Fast exact clustering of protein decoys , 2012, J. Comput. Chem..

[29]  Kam Y. J. Zhang,et al.  A Probabilistic Fragment-Based Protein Structure Prediction Algorithm , 2012, PloS one.

[30]  Pierre Tufféry,et al.  Fast protein fragment similarity scoring using a Binet-Cauchy kernel , 2014, Bioinform..

[31]  Kam Y. J. Zhang,et al.  A fragmentation and reassembly method for ab initio phasing. , 2015, Acta Crystallographica Section D: Biological Crystallography.

[32]  Pierre Tufféry,et al.  BCSearch: fast structural fragment mining over large collections of protein structures , 2015, Nucleic Acids Res..

[33]  Thomas Schiex,et al.  Balancing exploration and exploitation in population‐based sampling improves fragment‐based de novo protein structure prediction , 2017, Proteins.