Probabilistic cross‐link analysis and experiment planning for high‐throughput elucidation of protein structure

Emerging high‐throughput techniques for the characterization of protein and protein‐complex structures yield noisy data with sparse information content, placing a significant burden on computation to properly interpret the experimental data. One such technique uses cross‐linking (chemical or by cysteine oxidation) to confirm or select among proposed structural models (e.g., from fold recognition, ab initio prediction, or docking) by testing the consistency between cross‐linking data and model geometry. This paper develops a probabilistic framework for analyzing the information content in cross‐linking experiments, accounting for anticipated experimental error. This framework supports a mechanism for planning experiments to optimize the information gained. We evaluate potential experiment plans using explicit trade‐offs among key properties of practical importance: discriminability, coverage, balance, ambiguity, and cost. We devise a greedy algorithm that considers those properties and, from a large number of combinatorial possibilities, rapidly selects sets of experiments expected to discriminate pairs of models efficiently. In an application to residue‐specific chemical cross‐linking, we demonstrate the ability of our approach to plan experiments effectively involving combinations of cross‐linkers and introduced mutations. We also describe an experiment plan for the bacteriophage λ Tfa chaperone protein in which we plan dicysteine mutants for discriminating threading models by disulfide formation. Preliminary results from a subset of the planned experiments are consistent and demonstrate the practicality of planning. Our methods provide the experimenter with a valuable tool (available from the authors) for understanding and optimizing cross‐linking experiments.

[1]  Lan Guan,et al.  An approach to membrane protein structure without crystals , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[2]  J. Falke,et al.  The aspartate receptor cytoplasmic domain: in situ chemical analysis of structure, mechanism and dynamics. , 1999, Structure.

[3]  N Srinivasan,et al.  Stereochemical modeling of disulfide bridges. Criteria for introduction into proteins by site-directed mutagenesis. , 1989, Protein engineering.

[4]  Adam Godzik,et al.  Fold recognition methods. , 2005, Methods of biochemical analysis.

[5]  T L Blundell,et al.  FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. , 2001, Journal of molecular biology.

[6]  Y. Stierhof,et al.  Characterization of the helper proteins for the assembly of tail fibers of coliphages T4 and lambda , 1996, Journal of bacteriology.

[7]  Janusz M. Bujnicki,et al.  GeneSilico protein structure prediction meta-server , 2003, Nucleic Acids Res..

[8]  P. Rosevear,et al.  Protein global fold determination using site‐directed spin and isotope labeling , 2008, Protein science : a publication of the Protein Society.

[9]  Ting Chen,et al.  Algorithms for identifying protein cross-links via tandem mass spectrometry , 2001, J. Comput. Biol..

[10]  J. Skolnick,et al.  TOUCHSTONE: An ab initio protein structure prediction method that uses threading-based tertiary restraints , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[11]  David Fenyö,et al.  A modular cross-linking approach for exploring protein interactions. , 2002, Journal of the American Chemical Society.

[12]  W. Dong,et al.  Structural mapping of single cysteine mutants of cardiac troponin I , 2000, Proteins.

[13]  K N Houk,et al.  Quantitative evaluation of the lengths of homobifunctional protein cross‐linking reagents used as molecular rulers , 2001, Protein science : a publication of the Protein Society.

[14]  Malin M. Young,et al.  A top-down method for the determination of residue-specific solvent accessibility in proteins. , 2004, Journal of mass spectrometry : JMS.

[15]  J. Sun,et al.  Thiol cross-linking of cytoplasmic loops in the lactose permease of Escherichia coli. , 2000, Biochemistry.

[16]  Chris Bailey-Kellogg,et al.  Reducing Mass Degeneracy in SAR by MS by Stable Isotopic Labeling , 2000, ISMB.

[17]  Ulf Henning,et al.  An open reading frame in the Escherichia coli bacteriophage lambda genome encodes a protein that functions in assembly of the long tail fibers of bacteriophage T4 , 1987, Journal of bacteriology.

[18]  Daniel Hanisch,et al.  Improving fold recognition of protein threading by experimental distance constraints , 2002, Silico Biol..

[19]  A. Scaloni,et al.  Topology of the calmodulin-melittin complex. , 1998, Journal of molecular biology.

[20]  Birgit Schilling,et al.  MS2Assign, automated assignment and nomenclature of tandem mass spectra of chemically crosslinked peptides , 2003, Journal of the American Society for Mass Spectrometry.

[21]  M J Sternberg,et al.  On the use of chemically derived distance constraints in the prediction of protein structure with myoglobin as an example. , 1980, Journal of molecular biology.

[22]  David S. Johnson,et al.  Approximation algorithms for combinatorial problems , 1973, STOC.

[23]  T. Arakawa,et al.  Recombinant human erythropoietin (rHuEPO): Cross‐linking with disuccinimidyl esters and identification of the interfacing domains in EPO , 1993, Protein science : a publication of the Protein Society.

[24]  Malin M. Young,et al.  High throughput protein fold identification by using experimental constraints derived from intramolecular cross-links and mass spectrometry , 2000, Proc. Natl. Acad. Sci. USA.

[25]  R. Kuhn,et al.  Nucleic Acid-Dependent Cross-Linking of the Nucleocapsid Protein of Sindbis Virus , 2000, Journal of Virology.

[26]  U. Henning,et al.  Determinants of receptor specificity of coliphages of the T4 family. A chaperone alters the host range. , 1994, Journal of molecular biology.

[27]  J. Swaney Use of cross-linking reagents to study lipoprotein structure. , 1986, Methods in enzymology.

[28]  Marcin Feder,et al.  A “FRankenstein's monster” approach to comparative modeling: Merging the finest fragments of Fold‐Recognition models and iterative model refinement aided by 3D structure evaluation , 2003, Proteins.

[29]  Malin M. Young,et al.  A top down approach to protein structural studies using chemical cross-linking and Fourier transform mass spectrometry. , 2003, Rapid communications in mass spectrometry : RCM.

[30]  C Kooperberg,et al.  Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. , 1997, Journal of molecular biology.

[31]  B. Rost,et al.  Critical assessment of methods of protein structure prediction (CASP)—Round 6 , 2005, Proteins.

[32]  L. Grivell,et al.  A structure for the yeast prohibitin complex: Structure prediction and evidence from chemical crosslinking and mass spectrometry , 2002, Protein science : a publication of the Protein Society.

[33]  Arne Elofsson,et al.  MaxSub: an automated measure for the assessment of protein structure prediction quality , 2000, Bioinform..

[34]  R. Hendrix,et al.  Bacteriophage lambda PaPa: not the mother of all lambda phages. , 1992, Science.

[35]  J. Falke,et al.  Thermal motions of surface alpha-helices in the D-galactose chemosensory receptor. Detection by disulfide trapping. , 1992, Journal of molecular biology.

[36]  R. Hughes,et al.  Protein‐protein interactions directing resolvase site‐specific recombination: a structure‐function analysis. , 1993, The EMBO journal.

[37]  J. Falke,et al.  Thermal motions of surface α-helices in the d-galactose chemosensory receptor , 1992 .

[38]  M. Sternberg,et al.  Prediction of protein-protein interactions by docking methods. , 2002, Current opinion in structural biology.

[39]  L. Salwínski,et al.  A method for distance determination in proteins using a designed metal ion binding site and site-directed spin labeling: evaluation with T4 lysozyme. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[40]  Chris Bailey-Kellogg,et al.  Geometric Analysis of Cross-Linkability for Protein Fold Discrimination , 2003, Pacific Symposium on Biocomputing.