Solving the Secondary Structure Matching Problem in Cryo-EM De Novo Modeling Using a Constrained $K$-Shortest Path Graph Algorithm

Electron cryomicroscopy is becoming a major experimental technique in solving the structures of large molecular assemblies. More and more three-dimensional images have been obtained at the medium resolutions between 5 and 10 Å. At this resolution range, major α-helices can be detected as cylindrical sticks and β-sheets can be detected as plain-like regions. A critical question in de novo modeling from cryo-EM images is to determine the match between the detected secondary structures from the image and those on the protein sequence. We formulate this matching problem into a constrained graph problem and present an O(Δ2N22N) algorithm to this NP-Hard problem. The algorithm incorporates the dynamic programming approach into a constrained K-shortest path algorithm. Our method, DP-TOSS, has been tested using α-proteins with maximum 33 helices and α-β proteins up to five helices and 12 β-strands. The correct match was ranked within the top 35 for 19 of the 20 α-proteins and all nine α-β proteins tested. The results demonstrate that DP-TOSS improves accuracy, time and memory space in deriving the topologies of the secondary structure elements for proteins with a large number of secondary structures and a complex skeleton.

[1]  Kamal Al-Nasr,et al.  Structure prediction for the helical skeletons detected from the low resolution protein density map , 2010, BMC Bioinformatics.

[2]  Desh Ranjan,et al.  Ranking Valid Topologies of the Secondary Structure Elements Using a Constraint Graph , 2011, J. Bioinform. Comput. Biol..

[3]  M. Baker,et al.  Bridging the information gap: computational tools for intermediate resolution structure interpretation. , 2001, Journal of molecular biology.

[4]  Jing He,et al.  IDENTIFICATION OF α-HELICES FROM LOW RESOLUTION PROTEIN DENSITY MAPS , 2006 .

[5]  M. Pollack Letter to the Editor—The kth Best Route Through a Network , 1961 .

[6]  John D. Westbrook,et al.  EMDataBank.org: unified data resource for CryoEM , 2010, Nucleic Acids Res..

[7]  Jianpeng Ma,et al.  A Structural-informatics approach for tracing beta-sheets: building pseudo-C(alpha) traces for beta-strands in intermediate-resolution density maps. , 2004, Journal of molecular biology.

[8]  M. Baker,et al.  Modeling protein structure at near atomic resolutions with Gorgon. , 2011, Journal of structural biology.

[9]  Bernard F. Buxton,et al.  Secondary structure prediction with support vector machines , 2003, Bioinform..

[10]  Jianpeng Ma,et al.  Determining protein topology from skeletons of secondary structures. , 2005, Journal of molecular biology.

[11]  Matthew L. Baker,et al.  Shape modeling and matching in identifying 3D protein structures , 2008, Comput. Aided Des..

[12]  Dong Si,et al.  A machine learning approach for the identification of protein secondary structure elements from electron cryo-microscopy density maps. , 2012, Biopolymers.

[13]  Wah Chiu,et al.  Pushing back the limits of electron cryomicroscopy , 1997, Nature Structural Biology.

[14]  Jing He,et al.  Reduction of the secondary structure topological space through direct estimation of the contact energy formed by the secondary structures , 2009, BMC Bioinformatics.

[15]  Zeyun Yu,et al.  Computational Approaches for Automatic Structural Analysis of Large Biomolecular Complexes , 2008, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[16]  Jianpeng Ma,et al.  A Structural-informatics approach for tracing beta-sheets: building pseudo-C(alpha) traces for beta-strands in intermediate-resolution density maps. , 2004, Journal of molecular biology.

[17]  Wen Jiang,et al.  Deriving folds of macromolecular complexes through electron cryomicroscopy and bioinformatics approaches. , 2002, Current opinion in structural biology.

[18]  Yonggang Lu,et al.  Deriving Topology and Sequence Alignment for the Helix Skeleton in Low-Resolution protein Density Maps , 2008, J. Bioinform. Comput. Biol..

[19]  P. Stewart,et al.  EM-fold: De novo folding of alpha-helical proteins guided by intermediate-resolution electron microscopy density maps. , 2009, Structure.

[20]  Pierre Baldi,et al.  SCRATCH: a protein structure and structural feature prediction server , 2005, Nucleic Acids Res..

[21]  M. Baker,et al.  Identification of secondary structure elements in intermediate-resolution density maps. , 2007, Structure.

[22]  Stuart E. Dreyfus,et al.  An Appraisal of Some Shortest-Path Algorithms , 1969, Oper. Res..

[23]  Jing He,et al.  De novo protein structure modeling from cryoem data through a dynamic programming algorithm in the secondary structure topology graph , 2012 .

[24]  Jianpeng Ma,et al.  A structural-informatics approach for mining beta-sheets: locating sheets in intermediate-resolution density maps. , 2003, Journal of molecular biology.

[25]  Andrey N. Chernikov,et al.  Estimating loop length from CryoEM images at medium resolutions , 2013, BMC Structural Biology.

[26]  Marta M. B. Pascoal,et al.  Deviation Algorithms for Ranking Shortest Paths , 1999, Int. J. Found. Comput. Sci..

[27]  Enrico Pontelli,et al.  A Parallel Algorithm for Helix Mapping Between 3D and 1D Protein Structure Using the Length Constraints , 2004, ISPA.

[28]  Conrad C. Huang,et al.  UCSF Chimera—A visualization system for exploratory research and analysis , 2004, J. Comput. Chem..

[29]  M. C. Sinclair,et al.  A Comparative Study of k-Shortest Path Algorithms , 1996 .

[30]  Zeyun Yu,et al.  Computational Approaches for Automatic Structural Analysis of Large Biomolecular Complexes , 2008, TCBB.

[31]  W. Chiu,et al.  Seeing the herpesvirus capsid at 8.5 A. , 2000, Science.

[32]  Jing He,et al.  Native secondary structure topology has near minimum contact energy among all possible geometrically constrained topologies , 2009, Proteins.

[33]  W Chiu,et al.  EMAN: semiautomated software for high-resolution single-particle reconstructions. , 1999, Journal of structural biology.

[34]  Aoife McLysaght,et al.  Porter: a new, accurate server for protein secondary structure prediction , 2005, Bioinform..

[35]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[36]  Dong Si,et al.  Beta-sheet Detection and Representation from Medium Resolution Cryo-EM Density Maps , 2013, BCB.

[37]  Matthew L. Baker,et al.  Computing a Family of Skeletons of Volumetric Models for Shape Description , 2006, GMP.

[38]  Z. Zhou,et al.  3.88 Å structure of cytoplasmic polyhedrosis virus by cryo-electron microscopy , 2008, Nature.

[39]  W. Chiu Electron microscopy of frozen, hydrated biological specimens. , 1986, Annual review of biophysics and biophysical chemistry.

[40]  W. Chiu,et al.  Seeing GroEL at 6 A resolution by single particle electron cryomicroscopy. , 2004, Structure.

[41]  Xing Zhang,et al.  3.3 Å Cryo-EM Structure of a Nonenveloped Virus Reveals a Priming Mechanism for Cell Entry , 2010, Cell.

[42]  Wah Chiu,et al.  4.0-Å resolution cryo-EM structure of the mammalian chaperonin TRiC/CCT reveals its unique subunit arrangement , 2010, Proceedings of the National Academy of Sciences.

[43]  P. Stewart,et al.  EM-fold: de novo atomic-detail protein structure determination from medium-resolution density maps. , 2012, Structure.

[44]  J. Y. Yen,et al.  Finding the K Shortest Loopless Paths in a Network , 2007 .

[45]  Enrico Pontelli,et al.  Identification of alpha-helices from low resolution protein density maps. , 2006, Computational systems bioinformatics. Computational Systems Bioinformatics Conference.

[46]  Subhash Suri,et al.  Finding the k shortest simple paths , 2007, ALENEX.

[47]  C. Bron,et al.  Algorithm 457: finding all cliques of an undirected graph , 1973 .

[48]  Richard Pavley,et al.  A Method for the Solution of the Nth Best Path Problem , 1959, JACM.