Motif-Aware PRALINE: Improving the alignment of motif regions

Protein or DNA motifs are sequence regions which possess biological importance. These regions are often highly conserved among homologous sequences. The generation of multiple sequence alignments (MSAs) with a correct alignment of the conserved sequence motifs is still difficult to achieve, due to the fact that the contribution of these typically short fragments is overshadowed by the rest of the sequence. Here we extended the PRALINE multiple sequence alignment program with a novel motif-aware MSA algorithm in order to address this shortcoming. This method can incorporate explicit information about the presence of externally provided sequence motifs, which is then used in the dynamic programming step by boosting the amino acid substitution matrix towards the motif. The strength of the boost is controlled by a parameter, α. Using a benchmark set of alignments we confirm that a good compromise can be found that improves the matching of motif regions while not significantly reducing the overall alignment quality. By estimating α on an unrelated set of reference alignments we find there is indeed a strong conservation signal for motifs. A number of typical but difficult MSA use cases are explored to exemplify the problems in correctly aligning functional sequence motifs and how the motif-aware alignment method can be employed to alleviate these problems.

[1]  Jaap Heringa,et al.  Two Strategies for Sequence Comparison: Profile-preprocessed and Secondary Structure-induced Multiple Alignment , 1999, Comput. Chem..

[2]  J. Thompson,et al.  DbClustal: rapid and reliable global multiple alignments of protein sequences detected by database searches. , 2000, Nucleic acids research.

[3]  J. Heringa,et al.  Homology-extended sequence alignment , 2005, Nucleic acids research.

[4]  E. Falkowska,et al.  Hepatitis C Virus Envelope Glycoprotein E2 Glycans Modulate Entry, CD81 Binding, and Neutralization , 2007, Journal of Virology.

[5]  A. Bairoch PROSITE: a dictionary of sites and patterns in proteins. , 1991, Nucleic acids research.

[6]  Desmond G. Higgins,et al.  Sequence embedding for fast construction of guide trees for multiple sequence alignment , 2010, Algorithms for Molecular Biology.

[7]  Moonsung Choi,et al.  Proline 96 of the copper ligand loop of amicyanin regulates electron transfer from methylamine dehydrogenase by positioning other residues at the protein-protein interface. , 2011, Biochemistry.

[8]  Michael Kaufmann,et al.  DIALIGN-TX: greedy and progressive approaches for segment-based multiple sequence alignment , 2008, Algorithms for Molecular Biology.

[9]  Ramanathan Sowdhamini,et al.  Improvement of alignment accuracy utilizing sequentially conserved motifs , 2004, BMC Bioinformatics.

[10]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Jaap Heringa,et al.  HIV-1 envelope glycoprotein signatures that correlate with the development of cross-reactive neutralizing activity , 2013, Retrovirology.

[12]  Peter D. Kwong,et al.  The antigenic structure of the HIV gp120 envelope glycoprotein , 1998, Nature.

[13]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[14]  Saikat Chakrabarti,et al.  SMoS: a database of structural motifs of protein superfamilies. , 2003, Protein engineering.

[15]  Jaap Heringa,et al.  PRALINETM: a strategy for improved multiple alignment of transmembrane proteins , 2008, Bioinform..

[16]  P. Hogeweg,et al.  The alignment of sets of sequences and the construction of phyletic trees: An integrated method , 2005, Journal of Molecular Evolution.

[17]  Olivier Poch,et al.  BAliBASE 3.0: Latest developments of the multiple sequence alignment benchmark , 2005, Proteins.

[18]  R. Doolittle,et al.  Progressive sequence alignment as a prerequisitetto correct phylogenetic trees , 2007, Journal of Molecular Evolution.

[19]  Amos Bairoch,et al.  PROSITE: A Documented Database Using Patterns and Profiles as Motif Descriptors , 2002, Briefings Bioinform..

[20]  Geoffrey J. Barton,et al.  Jalview Version 2—a multiple sequence alignment editor and analysis workbench , 2009, Bioinform..

[21]  E T Adman,et al.  Copper protein structures. , 1991, Advances in protein chemistry.

[22]  Jaap Heringa,et al.  PRALINE: a multiple sequence alignment toolbox that integrates homology-extended and secondary structure information , 2005, Nucleic Acids Res..

[23]  John P. Overington,et al.  HOMSTRAD: A database of protein structure alignments for homologous families , 1998, Protein science : a publication of the Protein Society.

[24]  D. Higgins,et al.  Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega , 2011, Molecular systems biology.

[25]  Richa Agarwala,et al.  COBALT: constraint-based alignment tool for multiple protein sequences , 2007, Bioinform..

[26]  Arne Elofsson,et al.  KalignP: Improved multiple sequence alignments using position specific gap penalties in Kalign2 , 2011, Bioinform..

[27]  F. S. Mathews,et al.  X-ray structure of the cupredoxin amicyanin, from Paracoccus denitrificans, refined at 1.31 A resolution. , 1996, Acta crystallographica. Section D, Biological crystallography.

[28]  Jaap Heringa,et al.  ConBind: motif-aware cross-species alignment for the identification of functional transcription factor binding sites , 2015, Nucleic acids research.

[29]  M. O. Dayhoff,et al.  Atlas of protein sequence and structure , 1965 .

[30]  Sanne Abeln,et al.  Quantifying the Displacement of Mismatches in Multiple Sequence Alignment Benchmarks , 2015, PloS one.

[31]  Olivier Poch,et al.  A comprehensive comparison of multiple sequence alignment programs , 1999, Nucleic Acids Res..

[32]  Jakub Pas,et al.  ELM: the status of the 2010 eukaryotic linear motif resource , 2009, Nucleic Acids Res..

[33]  Simon Easteal,et al.  Mind the gaps: evidence of bias in estimates of multiple sequence alignments. , 2007, Molecular biology and evolution.

[34]  Martin A. Nowak,et al.  Antibody neutralization and escape by HIV-1 , 2003, Nature.

[35]  T. Gibson,et al.  Applying motif and profile searches. , 1996, Methods in enzymology.

[36]  Kimmo Mattila,et al.  Crystal structure of nitrous oxide reductase from Paracoccus denitrificans at 1.6 A resolution. , 2003, The Biochemical journal.

[37]  David S. Eisenberg,et al.  Finding families for genomic ORFans , 1999, Bioinform..

[38]  C Cambillau,et al.  Revisiting the Catalytic CuZ Cluster of Nitrous Oxide (N2O) Reductase , 2000, The Journal of Biological Chemistry.

[39]  P. Bork,et al.  Protein sequence motifs. , 1996, Current opinion in structural biology.

[40]  Alan Bridge,et al.  New and continuing developments at PROSITE , 2012, Nucleic Acids Res..