MAO: a Multiple Alignment Ontology for nucleic acid and protein sequences

The application of high-throughput techniques such as genomics, proteomics or transcriptomics means that vast amounts of heterogeneous data are now available in the public databases. Bioinformatics is responding to the challenge with new integrated management systems for data collection, validation and analysis. Multiple alignments of genomic and protein sequences provide an ideal environment for the integration of this mass of information. In the context of the sequence family, structural and functional data can be evaluated and propagated from known to unknown sequences. However, effective integration is being hindered by syntactic and semantic differences between the different data resources and the alignment techniques employed. One solution to this problem is the development of an ontology that systematically defines the terms used in a specific domain. Ontologies are used to share data from different resources, to automatically analyse information and to represent domain knowledge for non-experts. Here, we present MAO, a new ontology for multiple alignments of nucleic and protein sequences. MAO is designed to improve interoperation and data sharing between different alignment protocols for the construction of a high quality, reliable multiple alignment in order to facilitate knowledge extraction and the presentation of the most pertinent information to the biologist.

[1]  James W. Brown,et al.  RNAML: a standard syntax for exchanging RNA information. , 2002, RNA.

[2]  E Westhof,et al.  Modeling RNA tertiary structure from patterns of sequence variation. , 2000, Methods in enzymology.

[3]  Philip E. Bourne,et al.  CE-MC: a multiple protein structure alignment server , 2004, Nucleic Acids Res..

[4]  Emily Dimmer,et al.  The Gene Ontology Annotation (GOA) Database - An integrated resource of GO annotations to the UniProt Knowledgebase , 2003, Silico Biol..

[5]  Carl R. Woese,et al.  4 Probing RNA Structure, Function, and History by Comparative Analysis , 1993 .

[6]  J. D. Thompson,et al.  Multiple alignment of complete sequences (MACS) in the post-genomic era. , 2001, Gene.

[7]  Julia V Ponomarenko,et al.  Assigning new GO annotations to protein data bank sequences by combining structure and sequence homology , 2005, Proteins.

[8]  V A Simossis,et al.  Integrating protein secondary structure prediction and multiple sequence alignment. , 2004, Current protein & peptide science.

[9]  R. Durbin,et al.  The Sequence Ontology: a tool for the unification of genome annotations , 2005, Genome Biology.

[10]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[11]  Richard Hughey,et al.  Hidden Markov models for detecting remote protein homologies , 1998, Bioinform..

[12]  S. Segal,et al.  Differential Effects of IL-1α and IL-1β on Tumorigenicity Patterns and Invasiveness 1 , 2003, The Journal of Immunology.

[13]  Tim J. P. Hubbard,et al.  SCOP database in 2004: refinements integrate structure and sequence family data , 2004, Nucleic Acids Res..

[14]  D. Cozzetto,et al.  Relationship between multiple sequence alignments and quality of protein comparative models , 2004, Proteins.

[15]  H. Wolfson,et al.  Multiple structural alignment by secondary structures: Algorithm and applications , 2003, Protein science : a publication of the Protein Society.

[16]  Wei Tong,et al.  Analyzing the Biology on the System Level , 2004, Genomics, proteomics & bioinformatics.

[17]  Adam Godzik,et al.  Multiple flexible structure alignment using partial order graphs , 2005, Bioinform..

[18]  C. Dinarello,et al.  Interleukin-1, interleukin-1 receptors and interleukin-1 receptor antagonist. , 1998, International reviews of immunology.

[19]  Eric Westhof,et al.  Sequence to Structure (S2S): display, manipulate and interconnect RNA data from sequence to structure , 2005, Bioinform..

[20]  Cathy H. Wu,et al.  InterPro, progress and status in 2005 , 2004, Nucleic Acids Res..

[21]  Steven E. Brenner,et al.  SCOR: a Structural Classification of RNA database , 2002, Nucleic Acids Res..

[22]  T. N. Bhat,et al.  The Protein Data Bank: unifying the archive , 2002, Nucleic Acids Res..

[23]  Olivier Poch,et al.  PipeAlign: a new toolkit for protein family analysis , 2003, Nucleic Acids Res..

[24]  Zukang Feng,et al.  The Nucleic Acid Database. , 2002, Acta crystallographica. Section D, Biological crystallography.

[25]  Robert Stevens,et al.  Constructing ontology-driven protein family databases , 2005, Bioinform..

[26]  A. Pollock,et al.  The prodomain of interleukin 1α interacts with elements of the RNA processing apparatus and induces apoptosis in malignant cells , 2003, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[27]  J. Blake,et al.  Creating the Gene Ontology Resource : Design and Implementation The Gene Ontology Consortium 2 , 2001 .

[28]  Carole A. Goble,et al.  Ontology-based Knowledge Representation for Bioinformatics , 2000, Briefings Bioinform..

[29]  Aik Choon Tan,et al.  MSAT: a multiple sequence alignment tool based on TOPS. , 2004, Applied bioinformatics.

[30]  J. Bard,et al.  Ontologies in biology: design, applications and future challenges , 2004, Nature Reviews Genetics.

[31]  C. Woese,et al.  Phylogenetic structure of the prokaryotic domain: The primary kingdoms , 1977, Proceedings of the National Academy of Sciences of the United States of America.

[32]  Hans Lehrach,et al.  Automated Gene Ontology annotation for anonymous sequence data , 2003, Nucleic Acids Res..

[33]  Cédric Notredame,et al.  3DCoffee: combining protein sequences and structures within multiple sequence alignments. , 2004, Journal of molecular biology.

[34]  M. Ashburner,et al.  An ontology for cell types , 2005, Genome Biology.

[35]  Sean R. Eddy,et al.  Rfam: annotating non-coding RNAs in complete genomes , 2004, Nucleic Acids Res..

[36]  Cathy H. Wu,et al.  The Universal Protein Resource (UniProt) , 2004, Nucleic Acids Res..

[37]  Thomas R. Gruber,et al.  Toward principles for the design of ontologies used for knowledge sharing? , 1995, Int. J. Hum. Comput. Stud..

[38]  C R Woese,et al.  The phylogeny of prokaryotes. , 1980, Microbiological sciences.

[39]  Ciamac C. Moallemi,et al.  Protein family annotation in a multiple alignment viewer , 2003, Bioinform..

[40]  C. Sander,et al.  The HUPO PSI's Molecular Interaction format—a community standard for the representation of protein interaction data , 2004, Nature Biotechnology.

[41]  Søren Brunak,et al.  Prediction of human protein function according to Gene Ontology categories , 2003, Bioinform..

[42]  Olivier Poch,et al.  GOAnno: GO annotation based on multiple alignment , 2005, Bioinform..

[43]  Olivier Poch,et al.  BAliBASE 3.0: Latest developments of the multiple sequence alignment benchmark , 2005, Proteins.