Using Machine Learning to Explore the Relevance of Local and Global Features During Conformational Search in Rosetta

Our ongoing work focuses on improvements to the exploration behaviour of heuristic search techniques in fragment-assembly methods for protein structure prediction. Analysing and improving exploration in fragment-assembly can be difficult due to the complexity of measuring diversity between decoys in a meaningful manner. Here, we define a set of local and global features of decoy structures generated by Rosetta, and we use Machine Learning to explore the extent to which these are predictive of the final prediction results achieved by individual runs. The aim is to identify those feature subsets that show a significant correlation with final prediction outcome, and identify when they become fixed during the search. It is thought that such features may help in the formulation of new diversity measures that can be utilized in the context of explicit diversity mechanisms such as crowding, external archives etc. The time of fixture can help in deciding at what stage of the search the implementation of diversity mechanisms may be the most relevant.

[1]  K. Dill Theory for the folding and stability of globular proteins. , 1985, Biochemistry.

[2]  Oliver Brock,et al.  Guiding conformation space search with an all‐atom energy potential , 2008, Proteins.

[3]  David Baker,et al.  Protein Structure Prediction Using Rosetta , 2004, Numerical Computer Methods, Part D.

[4]  Amarda Shehu,et al.  Guiding the Search for Native-like Protein Conformations with an Ab-initio Tree-based Exploration , 2010, Int. J. Robotics Res..

[5]  Amarda Shehu,et al.  Probabilistic Search and Energy Guidance for Biased Decoy Sampling in Ab Initio Protein Structure Prediction , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[6]  Heitor Silvério Lopes,et al.  An Enhanced Genetic Algorithm for Protein Structure Prediction Using the 2D Hydrophobic-Polar Model , 2005, Artificial Evolution.

[7]  Oliver F. Lange,et al.  Structure prediction for CASP8 with all‐atom refinement using Rosetta , 2009, Proteins.

[8]  Julia Handl,et al.  Toward a detailed understanding of search trajectories in fragment assembly approaches to protein structure prediction , 2016, Proteins.

[9]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[10]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[11]  Oliver F. Lange,et al.  Consistent blind protein structure generation from NMR chemical shift data , 2008, Proceedings of the National Academy of Sciences.

[12]  Kam Y. J. Zhang,et al.  A Probabilistic Fragment-Based Protein Structure Prediction Algorithm , 2012, PloS one.

[13]  David E. Kim,et al.  Sampling bottlenecks in de novo protein structure prediction. , 2009, Journal of molecular biology.