An efficient and flexible scanning of databases of protein secondary structures

Protein secondary structure describe protein construction in terms of regular spatial shapes, including alpha-helices, beta-strands, and loops, which protein amino acid chain can adopt in some of its regions. This information is supportive for protein classification, functional annotation, and 3D structure prediction. The relevance of this information and the scope of its practical applications cause the requirement for its effective storage and processing. Relational databases, widely-used in commercial systems in recent years, are one of the serious alternatives honed by years of experience, enriched with developed technologies, equipped with the declarative SQL query language, and accepted by the large community of programmers. Unfortunately, relational database management systems are not designed for efficient storage and processing of biological data, such as protein secondary structures. In this paper, we present a new search method implemented in the search engine of the PSS-SQL language. The PSS-SQL allows formulation of queries against a relational database in order to find proteins having secondary structures similar to the structural pattern specified by a user. In the paper, we will show how the search process can be accelerated by multiple scanning of the Segment Index and parallel implementation of the alignment procedure using multiple threads working on multiple-core CPUs.

[1]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[2]  Yaoqi Zhou,et al.  Improving protein fold recognition and template-based modeling by employing probabilistic-based matching between predicted one-dimensional structural properties of query and corresponding native properties of templates , 2011, Bioinform..

[3]  Carl Branden,et al.  The art of PS2 : the complete set of figures, panels and tables from introduction to protein structure , 1999 .

[4]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[5]  William R. Taylor,et al.  Protein bioinformatics - an algorithmic approach to sequence and structure analysis , 2004 .

[6]  Jian Peng,et al.  Template-based protein structure modeling using the RaptorX web server , 2012, Nature Protocols.

[7]  Arthur M. Lesk,et al.  Introduction to Protein Science: Architecture, Function, and Genomics , 2001 .

[8]  J F Gibrat,et al.  Surprising similarities in structure comparison. , 1996, Current opinion in structural biology.

[9]  A. Gronenborn,et al.  Solution structure of cyanovirin-N, a potent HIV-inactivating protein , 1998, Nature Structural Biology.

[10]  Junfeng Huang,et al.  Template-Based Protein Structure Prediction: Template-Based Protein Structure Prediction , 2011 .

[11]  Andreas Prlic,et al.  Sequence analysis , 2003 .

[12]  Bożena Małysiak-Mrozek,et al.  Parallel implementation of 3D protein structure similarity searches using a GPU and the CUDA , 2014, Journal of Molecular Modeling.

[13]  Susie Stephens,et al.  ODM BLAST: Sequence Homology Search in the RDBMS , 2004, IEEE Data Eng. Bull..

[14]  R. Sunderraman,et al.  A Domain Specific Data Management Architecture for Protein Structure Data , 2006, 2006 International Conference of the IEEE Engineering in Medicine and Biology Society.

[15]  Douglas L. Brutlag,et al.  FoldMiner and LOCK 2: protein structure comparison and motif discovery on the web , 2004, Nucleic Acids Res..

[16]  Andreas Prlic,et al.  BioJava: an open-source framework for bioinformatics in 2012 , 2012, Bioinform..

[17]  Stéphane Bressan,et al.  Introduction to Database Systems , 2005 .

[18]  Dariusz Mrozek,et al.  PSS-SQL: Protein Secondary Structure - Structured Query Language , 2010, 2010 Annual International Conference of the IEEE Engineering in Medicine and Biology.

[19]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[20]  Alfons Kemper,et al.  Bulletin of the Ieee Computer Society Technical Committee on Data Engineering , 1999 .

[21]  E. Martz Introduction to proteins—structure, function, and motion , 2012 .

[22]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[23]  Forbes J. Burkowski Comprar Structural Bioinformatics: An Algorithmic Approach | Forbes J. Burkowski | 9781584886839 | Informa Healthcare , 2008 .

[24]  Dariusz Mrozek,et al.  CASSERT: A Two-Phase Alignment Algorithm for Matching 3D Structures of Proteins , 2013, CN.

[25]  Shohei Koide,et al.  High-resolution structure of a self-assembly-competent form of a hydrophobic peptide captured in a soluble beta-sheet scaffold. , 2008, Journal of molecular biology.

[26]  J. Patel,et al.  Declarative Querying for Biological Sequences , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[27]  Dariusz Mrozek,et al.  Server-Side Query Language for Protein Structure Similarity Searching , 2012 .

[28]  M. Perutz,et al.  The crystal structure of human deoxyhaemoglobin at 1.74 A resolution. , 1984, Journal of molecular biology.

[29]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[30]  Jignesh M. Patel,et al.  Searching on the Secondary Structure of Protein Sequences , 2002, VLDB.

[31]  Gert Vriend,et al.  Everyday , 2020, Oxford Research Encyclopedia of Literature.

[32]  C. Branden,et al.  Introduction to protein structure , 1991 .

[33]  Yuan-Fang Wang,et al.  CTSS: a robust and efficient method for protein structure alignment based on local geometrical and biological features , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[34]  P. Argos,et al.  Incorporation of non-local interactions in protein secondary structure prediction from the amino acid sequence. , 1996, Protein engineering.

[35]  Forbes J. Burkowski Structural Bioinformatics - An Algorithmic Approach , 2008, Chapman and Hall / CRC mathematical and computational biology series.

[36]  Adam Godzik,et al.  Flexible structure alignment by chaining aligned fragment pairs allowing twists , 2003, ECCB.

[37]  Dariusz Mrozek,et al.  MViewer: Visualization of Protein Molecular Structures Stored in the PDB, mmCIF and PDBML Data Formats , 2013, CN.

[38]  W. Delano The PyMOL Molecular Graphics System , 2002 .

[39]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.