Misleading local sequence alignments: implications for comparative protein modelling.

Although it is well known that significant sequence similarity between proteins is reflected at the structural level, it is commonly assumed that any misaligned regions, as judged by the correct structure based alignment, are those where the local sequence identity is lower than the global. Recent studies have shown that this is not always the case and there can exist short stretches of high local identity which is not reflected in the structure based alignment. An analysis is presented of 290 pairs of homologous proteins with a view to quantifying the occurrence of these misleading local sequence alignments (MLSAs). It is found that such MLSAs are likely if the global sequence identity is less than 40% and can occur even when it is greater than 60%. The results have implications for automated homology modelling and also for the inference of function made by comparison.