Statistical methods for the objective design of screening procedures for macromolecular crystallization.

The crystallization of a new macromolecule is still very much a trial-and-error process. As is well known, it requires the search of a large parameter space of experimental settings to find the relatively few idiosyncratic conditions that lead to diffraction-quality crystals. Crystallographers have developed a variety of screens to help identify initial crystallization conditions, including those based on systematic grids, incomplete factorial and sparse-matrix approaches. These are somewhat subjectively formulated based on accumulated data from past crystallization experiments. Ideally, one would prefer as objective a procedure as possible; however, that requires objective methods that incorporate a broad source of crystallization data. The Biological Macromolecular Crystallization Database (BMCD), a repository of all published crystallization conditions, is an obvious source of this data. This database has been augmented with a hierarchical classification of the macromolecules contained in the BMCD as well as extensive data on the additives used with them. A statistical analysis of the augmented BMCD shows the existence of significant correlations between families of macromolecules and the experimental conditions under which they crystallize. This in turn leads to a Bayesian technique for determining the probability of success of a set of experimental conditions based on the data in the BMCD as well as facts about a macromolecule known prior to crystallization. This has been incorporated into software that enables users to rank experimental conditions for new macromolecules generated by a dense partial factorial design. Finally, an additional advantage of the software described here is that it also facilitates the accumulation of the data required for improving the accuracy of estimation of the probabilities of success - knowledge of the conditions which lead to failure of crystallization.