ProML - the Protein Markup Language for specification of protein sequences, structures and families

We propose a specification language ProML for protein sequences, structures, and families based on the open XML standard. The language allows for portable, system-independent, machine-parsable and human-readable representation of essential features of proteins. The language is of immediate use for several bioinformatics applications: we discuss clustering of proteins into families and the representation of the specific shared features of the respective clusters. Moreover, we use ProML for specification of data used in fold recognition bench-marks exploiting experimentally derived distance constraints.

[1]  C. M. Sperberg-McQueen,et al.  Extensible markup language , 1997 .

[2]  Daniel Hanisch,et al.  Improving fold recognition of protein threading by experimental distance constraints , 2002, Silico Biol..

[3]  Dan Gusfield,et al.  Efficient algorithms for inferring evolutionary trees , 1991, Networks.

[4]  David Orchard,et al.  XML Linking Language (XLink) , 2001 .

[5]  C. M. Sperberg-McQueen,et al.  Extensible Markup Language (XML) , 1997, World Wide Web J..

[6]  Dominic A. Orchard,et al.  XML Linking Language (XLink) Version 1. 0. World Wide Web Consortium, Proposed Recommendation PR - x , 2000 .

[7]  Amos Bairoch,et al.  The PROSITE database, its status in 2002 , 2002, Nucleic Acids Res..

[8]  Amos Bairoch,et al.  The PROSITE database, its status in 1999 , 1999, Nucleic Acids Res..

[9]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[10]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[11]  N. Alexandrov,et al.  SARFing the PDB. , 1996, Protein engineering.

[12]  Emmanuel Barillot,et al.  XML, bioinformatics and data integration , 2001, Bioinform..

[13]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1977, Journal of molecular biology.

[14]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.