Prediction of interaction energies of substituted hydrogen-bonded Watson-Crick cytosine:guanine(8X) base pairs.

We investigated the variation in the interaction energy between the Watson-Crick hydrogen-bonded DNA base pairs guanine and cytosine (G(8X):C), where guanine is substituted in the C8 position by 37 different functional groups. Base pairs were optimized at the B3LYP/6-311+G(2d,p) level. A base pair complex containing a more strongly electron-withdrawing group remarkably forms a more stable base pair with C. Multivariate linear regression provided a quantitative relationship between the interaction energies and descriptors generated by the quantum chemical topology (QCT) approach. The descriptors were sampled from the monomers only, not the supermolecular base pair complexes. A model with r2 = 0.96 and a root-mean-square (rms) value of 0.6 kJ/mol was obtained for a training set of 28 base pair complexes. The model was tested by an external test set of 9 complexes, yielding r2 = 0.99 and an rms value of 0.2 kJ/mol. The results indicated that the bonds C6=O6 and N2-H2 at the hydrogen-bonded frontier of the guanine derivatives play an important role in transmitting the substituent effects. A linear correlation between substitution energies and Hammett constants (sigma(m)) was also obtained for all 37 substituents, yielding r2 = 0.82 and an rms value of 1.2 kJ/mol. The model based on QCT descriptors can therefore be used for the prediction of the interaction energy of the base pair G(8x):C, strictly based on data for the G(8x) monomers only.