A Combinatorial Toolbox for Protein Sequence Design and Landscape Analysis in the Grand Canonical Model

In modern biology, one of the most important research problems is to understand how protein sequences fold into their native 3D structures. To investigate this problem at a high level, one wishes to analyze the protein landscapes, i.e., the structures of the space of all protein sequences and their native 3D structures. Perhaps the most basic computational problem at this level is to take a target 3D structure as input and design a fittest protein sequence with respect to one or more fitness functions of the target 3D structure. We develop a toolbox of combinatorial techniques for protein landscape analysis in the Grand Canonical model of Sun, Brem, Chan, and Dill. The toolbox is based on linear programming, network flow, and a linear-size representation of all minimum cuts of a network. It not only substantially expands the network flow technique for protein sequence design in Kleinberg's seminal work but also is applicable to a considerably broader collection of computational problems than those considered by Kleinberg. We have used this toolbox to obtain a number of efficient algorithms and hardness results. We have further used the algorithms to analyze 3D structures drawn from the Protein Data Bank and have discovered some novel relationships between such native 3D structures and the Grand Canonical model.

[1]  Mihalis Yannakakis,et al.  Suboptimal Cuts: Their Enumeration, Weight and Number (Extended Abstract) , 1992, ICALP.

[2]  Harold N. Gabow,et al.  Applications of a poset representation to edge connectivity and graph rigidity , 1991, [1991] Proceedings 32nd Annual Symposium of Foundations of Computer Science.

[3]  Andrew V. Goldberg,et al.  A new approach to the maximum flow problem , 1986, STOC '86.

[4]  Frank Thomson Leighton,et al.  Protein folding in the hydrophobic-hydrophilic (HP) is NP-complete , 1998, RECOMB '98.

[5]  Jon M. Kleinberg,et al.  Efficient algorithms for protein sequence design and the analysis of certain evolutionary fitness landscapes , 1999, J. Comput. Biol..

[6]  J. Scott Provan,et al.  The Complexity of Counting Cuts and of Computing the Probability that a Graph is Connected , 1983, SIAM J. Comput..

[7]  R. Jaenicke The protein folding problem and tertiary structure prediction , 1995 .

[8]  William E. Hart,et al.  On the Intractability of Protein Folding with a Finite Alphabet of Amino Acids , 1999, Algorithmica.

[9]  D. Lipman,et al.  Modelling neutral and selective evolution of protein folding , 1991, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[10]  Mihalis Yannakakis,et al.  On the complexity of protein folding (extended abstract) , 1998, STOC '98.

[11]  Kenneth Steiglitz,et al.  Combinatorial Optimization: Algorithms and Complexity , 1981 .

[12]  Frank Eisenhaber,et al.  Improved strategy in analytic surface calculation for molecular systems: Handling of singularities and computational efficiency , 1993, J. Comput. Chem..

[13]  Chris Sander,et al.  The double cubic lattice method: Efficient approaches to numerical integration of surface area and volume and to dot surface contouring of molecular assemblies , 1995, J. Comput. Chem..

[14]  L. Lovász,et al.  Geometric Algorithms and Combinatorial Optimization , 1981 .

[15]  P. Stadler,et al.  Neutral networks in protein space: a computational study based on knowledge-based potentials of mean force. , 1997, Folding & design.

[16]  Drexler Ke,et al.  Molecular engineering: An approach to the development of general capabilities for molecular manipulation. , 1981, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Mihalis Yannakakis,et al.  On the Complexity of Protein Folding , 1998, J. Comput. Biol..

[18]  P. Schuster,et al.  Generic properties of combinatory maps: neutral networks of RNA secondary structures. , 1997, Bulletin of mathematical biology.

[19]  J. Maynard Smith Natural Selection and the Concept of a Protein Space , 1970 .

[20]  K. Dill,et al.  Inverse protein folding problem: designing polymer sequences. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[21]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[22]  George Steiner,et al.  An algorithm to generate the ideals of a partial order , 1986 .

[23]  T. Jukes,et al.  The neutral theory of molecular evolution. , 2000, Genetics.

[24]  K. Dill,et al.  A lattice statistical mechanics model of the conformational and sequence spaces of proteins , 1989 .

[25]  A. Maritan,et al.  Design of proteins with hydrophobic and polar amino acids , 1997, Proteins.

[26]  Maurice Queyranne,et al.  On the structure of all minimum cuts in a network and applications , 1982, Math. Program..

[27]  E. Lawler A PROCEDURE FOR COMPUTING THE K BEST SOLUTIONS TO DISCRETE OPTIMIZATION PROBLEMS AND ITS APPLICATION TO THE SHORTEST PATH PROBLEM , 1972 .

[28]  Flavio Seno,et al.  Structure‐based design of model proteins , 1998, Proteins.

[29]  Deutsch,et al.  New algorithm for protein design. , 1995, Physical review letters.

[30]  E. Shakhnovich,et al.  A new approach to the design of stable proteins. , 1993, Protein engineering.

[31]  J. Ponder,et al.  Tertiary templates for proteins. Use of packing criteria in the enumeration of allowed sequences for different structural classes. , 1987, Journal of molecular biology.

[32]  K. Dill,et al.  Designing amino acid sequences to fold with good hydrophobic cores. , 1995, Protein engineering.

[33]  D. Yee,et al.  Principles of protein folding — A perspective from simple exact models , 1995, Protein science : a publication of the Protein Society.

[34]  William E. Hart On the computational complexity of sequence design problems , 1997, RECOMB '97.

[35]  K. Dill,et al.  Theory for protein mutability and biogenesis. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[36]  K E Drexler,et al.  Molecular engineering: An approach to the development of general capabilities for molecular manipulation. , 1981, Proceedings of the National Academy of Sciences of the United States of America.