Modeling antibody hypervariable loops : A combined algorithm ( loop replacement / antigen combining site / complementarity-determing regions )

To be of any value, a predicted model of an antibody combining site should have an accuracy approaching that of antibody structures determined by x-ray crystallography (1.6-2.7 A). A number of modeling protocols have been proposed, which fall into two main categories-those that adopt a knowledge-based approach and those that attempt to construct the hypervariable loop regions of the antibody ab initio. Here we present a combined algorithm requiring no arbitrary decisions on the part of the user, which has been successfully applied to the modeling of the individual loops in two systems: the anti-lysozyme antibody HyHel-5, the crystal structure of which is as a complex with lysozyme [Sheriff, S., Silverton, E. W., Padlan, E. A., Cohen, G. H., Smith-GM, S. J., Finzel, B. C. & Davies, D. R. (1987) Proc. Naul. Acad. Sci. USA 84, 80754-79], and the free antigen binding fragment (Fab) of the anti-lysozyme peptide antibody, Gloop2. This protocol may be used with a high degree of confidence to model single-loop replacements, insertions, deletions, and side-chain replacements. In addition, it may be used in conjunction with other modeling protocols as a method by which to model particular loops whose conformations are predicted poorly by these methods. The wide range of specificities exhibited by antibodies is a function of the sequence and length variability of six hypervariable loops or complementarity-determining regions (CDRs) (1), which form the antigen combining site. These six CDRs supported on a highly conserved framework region constitute the variable region of the antigen binding fragment (Fab). A knowledge of antibody structure is essential for intelligent design ofantibody enzymes (2), tailoring of affinity (3), and CDR replacement strategies (4). However, sequence information vastly exceeds structural information from x-ray crystallography and, until crystallographic structure determination becomes no less routine than sequencing, modeling of structures is necessary. Since the framework region is conserved, it has proved relatively easy to model, whereas the CDRs, by their very nature (5), present a more challenging problem since accuracy in their modeling is of paramount importance. The approaches taken to modeling the antibody combining site, so far, fall into two groups: knowledge based and ab initio. Knowledge-based approaches have been used to model a number of antibodies, including J539 (6), GLOOP1-5 (5), HyHel-10 (7), and D1.3 (8). Although the methods differ in their detail, the common feature of all the approaches has been to examine only the known antibody crystal structures and select CDRs from these on the basis of length and/or sequence. Although most methods use simple sequence homology to select model loop conformations, Chothia and Lesk (9) have obtained better results by selecting conformations on the basis of the conservation of "key" residues that affect loop packing or conformation. However, the chief problem with any such method is the limited size of the knowledge base: while this has been extended to include all protein loops in the broader protein modeling field (10), a general data base has not previously been used in antibody modeling. The second approach has been to use ab initio conformational search algorithms to saturate the conformational space available to a loop and select an appropriate structure on the basis of its energy, calculated by using an empirical energy function (11). Whereas this overcomes the limited size of the knowledge base, it fails to make use of the valuable information available in the structural data base and, consequently, is extremely expensive in computer time. Representative are the conformational search methods of Moult and James (12) and Bruccoleri and Karplus (13) and the random conformation dynamics method of Fine et al. (14). To date, the knowledge-based and ab initio approaches have had no more than limited success; they are not routinely able to construct all six CDRs with a high level of accuracy. The knowledge-based model ofJ539 (6), when compared with the crystal structure (15), shows rms deviations of 1.1-4.0 A (backbone) and 2.0-6.5 A (all atoms). Chothia's model of D1.3 (8) is better, with rms deviations of 0.50-0.97 A (backbone) for five of the six loops; CDR-H1 (H refers to the heavy chain) has an rms of2.07 A. (All atom rms deviations have not been published.) The conformational search model for HyHel-5 (16) using the program CONGEN (13) shows rms deviations of 0.5-2.1 A (backbone) and 1.7-4.1 A (all atoms), whereas that for McPC603 (16) shows deviations of 0.7-2.6 A (backbone) and 1.4-3.3 A (all atoms). The recent crystal structure for the anti-lysozyme antibody Gloop2 (Phil Jeffrey, Garry Taylor, Robert Griest, Steven Sheriff, and A.R.R., unpublished data) has enabled us to evaluate the published knowledge-based maximum overlap model for Gloop2 from this laboratory (5). In four of the six loops, the agreement between the two is good: 0.64-0.80 A (backbone) and 0.97-1.95 A (all atoms). For the two remaining loops (CDR-H2 and CDR-H3), however, the rms deviations are 1.77 A and 3.61 A (backbone) and 2.98 A and 5.48 A (all atoms), respectively. The poorest loop, CDR-H3, is much shorter than that observed in any of the crystal structures used to create the model and thus required a major deletion to be made during the modeling. It is thus not surprising, given the unreliability of such manual deletions and insertions (17), that the conformation predicted for this loop is wrong. It should also be noted that the loops modeled in Gloop2 follow the sequence definition of Kabat et al. (18) and not the structural definition of Chothia and Lesk (9), which forms the basis of the rms deviations cited for the other models (see Discussion). Thus, it is clear that neither knowledge-based nor ab initio methods, when used alone, allow the accurate construction Abbreviation: CDR, complementarity-determining region. *To whom reprint requests should be sent. tCurrent address (until April 1990): Igen Inc., 1500 East Jefferson Street, Rockville, MD 20852. 9268 The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. §1734 solely to indicate this fact. Proc. Natl. Acad. Sci. USA 86 (1989) 9269 of all six hypervariable loops on a routine basis. We have combined the two methods to produce a protocol that overcomes the limited size of the knowledge base by using conformational searching but does not ignore the information available in the structural data base.