Comparing Chemistry to Outcome: The Development of a Chemical Distance Metric, Coupled with Clustering and Hierarchal Visualization Applied to Macromolecular Crystallography

Many bioscience fields employ high-throughput methods to screen multiple biochemical conditions. The analysis of these becomes tedious without a degree of automation. Crystallization, a rate limiting step in biological X-ray crystallography, is one of these fields. Screening of multiple potential crystallization conditions (cocktails) is the most effective method of probing a proteins phase diagram and guiding crystallization but the interpretation of results can be time-consuming. To aid this empirical approach a cocktail distance coefficient was developed to quantitatively compare macromolecule crystallization conditions and outcome. These coefficients were evaluated against an existing similarity metric developed for crystallization, the C6 metric, using both virtual crystallization screens and by comparison of two related 1,536-cocktail high-throughput crystallization screens. Hierarchical clustering was employed to visualize one of these screens and the crystallization results from an exopolyphosphatase-related protein from Bacteroides fragilis, (BfR192) overlaid on this clustering. This demonstrated a strong correlation between certain chemically related clusters and crystal lead conditions. While this analysis was not used to guide the initial crystallization optimization, it led to the re-evaluation of unexplained peaks in the electron density map of the protein and to the insertion and correct placement of sodium, potassium and phosphate atoms in the structure. With these in place, the resulting structure of the putative active site demonstrated features consistent with active sites of other phosphatases which are involved in binding the phosphoryl moieties of nucleotide triphosphates. The new distance coefficient, CDcoeff, appears to be robust in this application, and coupled with hierarchical clustering and the overlay of crystallization outcome, reveals information of biological relevance. While tested with a single example the potential applications related to crystallography appear promising and the distance coefficient, clustering, and hierarchal visualization of results undoubtedly have applications in wider fields.

[1]  Joshua LaBaer,et al.  PSI:Biology-materials repository: a biologist’s resource for protein expression plasmids , 2011, Journal of Structural and Functional Genomics.

[2]  Thomas S. Peat,et al.  The C6 Web Tool: A Resource for the Rational Selection of Crystallization Conditions , 2010 .

[3]  G. Murshudov,et al.  Refinement of macromolecular structures by the maximum-likelihood method. , 1997, Acta crystallographica. Section D, Biological crystallography.

[4]  David M. Blow,et al.  Microbatch crystallization under oil — a new technique allowing many small-volume crystallization trials , 1992 .

[5]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[6]  G. N. Lance,et al.  A General Theory of Classificatory Sorting Strategies: 1. Hierarchical Systems , 1967, Comput. J..

[7]  Gaetano T Montelione,et al.  The high-throughput protein sample production platform of the Northeast Structural Genomics Consortium. , 2010, Journal of structural biology.

[8]  John M. Barnard,et al.  Chemical Similarity Searching , 1998, J. Chem. Inf. Comput. Sci..

[9]  Eric Jones,et al.  SciPy: Open Source Scientific Tools for Python , 2001 .

[10]  G. N. Lance,et al.  A general theory of classificatory sorting strategies: II. Clustering systems , 1967, Comput. J..

[11]  Jennifer R. Wolfley,et al.  What's in a drop? Correlating observations and outcomes to guide macromolecular crystallization experiments. , 2011, Crystal growth & design.

[12]  K. D. Collins Ion hydration: Implications for cellular function, polyelectrolytes, and protein crystallization. , 2006, Biophysical chemistry.

[13]  E. Landau,et al.  The Hofmeister series: salt and solvent effects on interfacial phenomena , 1997, Quarterly Reviews of Biophysics.

[14]  E. Koonin,et al.  A novel family of predicted phosphoesterases includes Drosophila prune protein and bacterial RecJ exonuclease. , 1998, Trends in biochemical sciences.

[15]  J. T. Curtis,et al.  An Ordination of the Upland Forest Communities of Southern Wisconsin , 1957 .

[16]  L. Delbaere,et al.  How do kinases transfer phosphoryl groups? , 1998, Structure.

[17]  Joseph R Luft,et al.  Lessons from high-throughput protein crystallization screening: 10 years of practical experience , 2011, Expert opinion on drug discovery.

[18]  Meriem I. Said,et al.  Efficient optimization of crystallization conditions by manipulation of drop volume ratio and temperature , 2007, Protein science : a publication of the Protein Society.

[19]  Gaohua Liu,et al.  Preparation of protein samples for NMR structure, function, and small-molecule screening studies. , 2011, Methods in enzymology.

[20]  R. Sokal,et al.  THE COMPARISON OF DENDROGRAMS BY OBJECTIVE METHODS , 1962 .

[21]  Raymond M Nagel,et al.  The application and use of chemical space mapping to interpret crystallization screening results , 2008, Acta crystallographica. Section D, Biological crystallography.

[22]  E. Kaler,et al.  Patterns of protein–protein interactions in salt solutions and implications for protein crystallization , 2007, Protein science : a publication of the Protein Society.

[23]  Franz Hofmeister,et al.  Zur Lehre von der Wirkung der Salze , 1888, Archiv für experimentelle Pathologie und Pharmakologie.

[24]  N. Nikolova,et al.  International Union of Pure and Applied Chemistry, LUMO energy ± The Lowest Unoccupied Molecular Orbital (LUMO) , 2022 .

[25]  Daniel H. Huson,et al.  Dendroscope: An interactive viewer for large phylogenetic trees , 2007, BMC Bioinformatics.

[26]  Z. Otwinowski,et al.  Processing of X-ray diffraction data collected in oscillation mode. , 1997, Methods in enzymology.

[27]  Roger A. Sayle,et al.  On the need for an international effort to capture, share and use crystallization screening data , 2012, Acta crystallographica. Section F, Structural biology and crystallization communications.

[28]  Joseph R Luft,et al.  A deliberate approach to screening for initial crystallization conditions of biological macromolecules. , 2003, Journal of structural biology.

[29]  Thomas C. Terwilliger,et al.  Automated MAD and MIR structure solution , 1999, Acta crystallographica. Section D, Biological crystallography.

[30]  Sung-Hou Kim,et al.  Sparse matrix sampling: a screening method for crystallization of proteins , 1991 .

[31]  J. Thornton,et al.  PROCHECK: a program to check the stereochemical quality of protein structures , 1993 .

[32]  P. Cremer,et al.  Interactions between macromolecules and ions: The Hofmeister series. , 2006, Current opinion in chemical biology.

[33]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[34]  P. Emsley,et al.  Features and development of Coot , 2010, Acta crystallographica. Section D, Biological crystallography.

[35]  Terese Bergfors,et al.  Seeds to crystals. , 2003, Journal of structural biology.

[36]  Katherine A. Kantardjieff,et al.  Protein isoelectric point as a predictor for increased crystallization screening efficiency , 2004, Bioinform..

[37]  George M. Sheldrick,et al.  Experimental phasing with SHELXC/D/E: combining chain tracing with density modification , 2010, Acta crystallographica. Section D, Biological crystallography.

[38]  Raymond M Nagel,et al.  AutoSherlock: a program for effective crystallization data analysis. , 2008, Journal of applied crystallography.

[39]  E. Krause,et al.  Taxicab Geometry: An Adventure in Non-Euclidean Geometry , 1987 .