Selection of DNA binding sites by regulatory proteins. Functional specificity and pseudosite competition.

The frequency of base-pair occurrence in a set of recognition sequences for a particular DNA-binding protein is strongly related to the contributions to the binding free energy from the individual base pairs. Thus from the statistics of base-pair choice, it is possible to estimate the relative binding strengths of any base-pair sequences and to predict the effect of point mutations in specific sites. On the same basis, one can describe the binding properties of random DNA sequences and thereby the expected competitive effects from all the nonspecific DNA sites in the genome of a living cell. The statistical selection theory [Berg & von Hippel.J. Mol. Biol. 193 (1987) 723-750] describing these relations is extended and tested with computer simulations. The theory is shown to hold up well also in the case when base pairs contribute cooperatively to the binding interaction. The simulations also demonstrate the effects of the statistical small-sample uncertainty that appears due to the limited size of all sets of recognition sites identified.