A DECLARATIVE QUERY LANGUAGE FOR PROTEIN SECONDARY STRUCTURES

Searching proteins on their secondary structures pr ovides a rough and fast method of identification of molecules having a similar fold. Since existing database man agement systems do not offer integrated exploration methods for querying protein structures, the structural similar ity searching is usually performed by external tool s. This often lengthens the processing time and requires addition al processing steps, like adaptation of input and o utput data formats. In the paper, we present the extended SQL language, which allows searching a database in order to find proteins having secondary structures similar to the structural patt ern specified by a user. Presented query language i s integrated with the relational database management system and it simpli fies the manipulation of biological data.

[1]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[2]  Douglas L. Brutlag,et al.  FoldMiner and LOCK 2: protein structure comparison and motif discovery on the web , 2004, Nucleic Acids Res..

[3]  T. Creighton Proteins: Structures and Molecular Properties , 1986 .

[4]  Alfons Kemper,et al.  Bulletin of the Ieee Computer Society Technical Committee on Data Engineering , 1999 .

[5]  S. Spragg Biophysical chemistry , 1979, Nature.

[6]  P. Argos,et al.  Incorporation of non-local interactions in protein secondary structure prediction from the amino acid sequence. , 1996, Protein engineering.

[7]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[8]  Jignesh M. Patel,et al.  Searching on the Secondary Structure of Protein Sequences , 2002, VLDB.

[9]  J. Richards The structure and action of proteins , 1969 .

[10]  C. Branden,et al.  Introduction to protein structure , 1991 .

[11]  Stéphane Bressan,et al.  Introduction to Database Systems , 2005 .

[12]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[13]  Susie Stephens,et al.  ODM BLAST: Sequence Homology Search in the RDBMS , 2004, IEEE Data Eng. Bull..

[14]  Jiaan Yang Comprehensive description of protein structures using protein folding shape code , 2008, Proteins: Structure, Function, and Bioinformatics.

[15]  J. Patel,et al.  Declarative Querying for Biological Sequences , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[16]  William R. Taylor,et al.  Protein bioinformatics - an algorithmic approach to sequence and structure analysis , 2004 .

[17]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[18]  D. Eisenberg Proteins. Structures and molecular properties, T.E. Creighton. W. H. Freeman and Company, New York (1984), 515, $36.95 , 1985 .

[19]  Yuan-Fang Wang,et al.  CTSS: a robust and efficient method for protein structure alignment based on local geometrical and biological features , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[20]  J F Gibrat,et al.  Surprising similarities in structure comparison. , 1996, Current opinion in structural biology.

[21]  R. Sunderraman,et al.  A Domain Specific Data Management Architecture for Protein Structure Data , 2006, 2006 International Conference of the IEEE Engineering in Medicine and Biology Society.

[22]  Dariusz Mrozek,et al.  A METHOD FOR MATCHING SEQUENCES OF PROTEIN SECONDARY STRUCTURES , 2010 .

[23]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..