Query language for protein molecular structures

Secondary structure representation of proteins provides important information regarding protein general construction and shape. This representation is often used in protein similarity searching. Since existing commercial database management systems do not offer integrated exploration methods for biological data e.g. at the level of the SQL language, the structural similarity searching is usually performed by external tools. In the paper, we present our newly developed PSS-SQL language, which allows searching the database in order to identify proteins having secondary structure similar to the structure specified by the user in a PSS-SQL query. Therefore, we provide a simple and declarative language for protein structure similarity searching.

[1]  J. Richards The structure and action of proteins , 1969 .

[2]  William R. Taylor,et al.  Protein bioinformatics - an algorithmic approach to sequence and structure analysis , 2004 .

[3]  Dariusz Mrozek,et al.  Searching for strong structural protein similarities with EAST , 2007 .

[4]  Yuan-Fang Wang,et al.  CTSS: a robust and efficient method for protein structure alignment based on local geometrical and biological features , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[5]  P. Argos,et al.  Incorporation of non-local interactions in protein secondary structure prediction from the amino acid sequence. , 1996, Protein engineering.

[6]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[7]  J F Gibrat,et al.  Surprising similarities in structure comparison. , 1996, Current opinion in structural biology.

[8]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[9]  R. Sunderraman,et al.  A Domain Specific Data Management Architecture for Protein Structure Data , 2006, 2006 International Conference of the IEEE Engineering in Medicine and Biology Society.

[10]  Douglas L. Brutlag,et al.  FoldMiner and LOCK 2: protein structure comparison and motif discovery on the web , 2004, Nucleic Acids Res..

[11]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[12]  Jiaan Yang Comprehensive description of protein structures using protein folding shape code , 2008, Proteins.

[13]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[14]  Jignesh M. Patel,et al.  Searching on the Secondary Structure of Protein Sequences , 2002, VLDB.

[15]  J. Patel,et al.  Declarative Querying for Biological Sequences , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[16]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[17]  D. Eisenberg Proteins. Structures and molecular properties, T.E. Creighton. W. H. Freeman and Company, New York (1984), 515, $36.95 , 1985 .