PSS-SQL: Protein Secondary Structure - Structured Query Language

Secondary structure representation of proteins provides important information regarding protein general construction and shape. This representation is often used in protein similarity searching. Since existing commercial database management systems do not offer integrated exploration methods for biological data e.g. at the level of the SQL language, the structural similarity searching is usually performed by external tools. In the paper, we present our newly developed PSS-SQL language, which allows searching a database in order to identify proteins having secondary structure similar to the structure specified by the user in a PSS-SQL query. Therefore, we provide a simple and declarative language for protein structure similarity searching.

[1]  T. Creighton Proteins: Structures and Molecular Properties , 1986 .

[2]  William R. Taylor,et al.  Protein bioinformatics - an algorithmic approach to sequence and structure analysis , 2004 .

[3]  C. Branden,et al.  Introduction to protein structure , 1991 .

[4]  Dariusz Mrozek,et al.  Query language for protein molecular structures , 2010 .

[5]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[6]  Jignesh M. Patel,et al.  Searching on the Secondary Structure of Protein Sequences , 2002, VLDB.

[7]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[8]  Yuan-Fang Wang,et al.  CTSS: a robust and efficient method for protein structure alignment based on local geometrical and biological features , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[9]  Douglas L. Brutlag,et al.  FoldMiner and LOCK 2: protein structure comparison and motif discovery on the web , 2004, Nucleic Acids Res..

[10]  S. Spragg Biophysical chemistry , 1979, Nature.

[11]  C. J. Date An Introduction to Database Systems , 1975 .

[12]  J. Patel,et al.  Declarative Querying for Biological Sequences , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[13]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[14]  J F Gibrat,et al.  Surprising similarities in structure comparison. , 1996, Current opinion in structural biology.

[15]  R. Sunderraman,et al.  A Domain Specific Data Management Architecture for Protein Structure Data , 2006, 2006 International Conference of the IEEE Engineering in Medicine and Biology Society.

[16]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[17]  Susie Stephens,et al.  ODM BLAST: Sequence Homology Search in the RDBMS , 2004, IEEE Data Eng. Bull..

[18]  Jiaan Yang Comprehensive description of protein structures using protein folding shape code , 2008, Proteins: Structure, Function, and Bioinformatics.

[19]  Richard Earl Dickerson,et al.  Stereo supplement to the structure and action of proteins , 1969 .

[20]  Stéphane Bressan,et al.  Introduction to Database Systems , 2005 .