Sampling rare events: statistics of local sequence alignments.

A method to calculate probability distributions in regions where the events are very unlikely (e.g., p approximately 10(-40)) is presented. The basic idea is to map the underlying model on a physical system. The system is simulated at a low temperature, such that preferably configurations with originally low probabilities are generated. Since the distribution of such a physical system is known, the original unbiased distribution can be obtained. As an application, local alignment of protein sequences is studied. The deviation of the distribution p(S) of optimum scores from the extreme-value distribution is quantified. This deviation decreases with growing sequence length.