Adaptive Local Realignment via Parameter Advising

Mutation rates can vary across the residues of a protein, but when multiple sequence alignments are computed for protein sequences, the same choice of values for the substitution score and gap penalty parameters is often used across their entire length. We provide for the first time a new method called adaptive local realignment that automatically uses diverse alignment parameter settings in different regions of the input sequences when computing protein multiple sequence alignments. This allows parameter settings to locally adapt across the length of a protein to more closely match varying mutation rates. Our method builds on our prior work on global alignment parameter advising with the Facet alignment accuracy estimator. Given a computed alignment, in each region that has low estimated accuracy, a collection of candidate realignments is generated using a precomputed set of alternate parameter choices. If one of these alternate realignments has higher estimated accuracy than the original subalignment, the region is replaced with the realignment, and the concatenation of these realigned regions forms the output alignment. Adaptive local realignment significantly improves the quality of alignments over using the single best default parameter choice. In particular, this new method of local advising, when combined with prior methods for global advising, boosts alignment accuracy by almost 23% over the best default parameter setting on the hardest-to-align benchmarks (and almost 5.9% over using global advising alone). A new version of the Opal multiple sequence aligner that incorporates adaptive local realignment, using Facet for parameter advising, is available free for non-commercial use at facet.cs.arizona.edu. This poster abstract is a summary of a preprint paper available on bioRxiv [1].