More Reliable Protein NMR Peak Assignment via Improved 2-Interval Scheduling

Protein NMR peak assignment refers to the process of assigning a group of spin systems obtained experimentally to a protein sequence of amino acids. The automation of this process is still an unsolved and challenging problem in NMR protein structure determination. Recently, protein backbone NMR peak assignment has been formulated as an interval scheduling problem, where a protein sequence P of amino acids is viewed as a discrete time interval I (the amino acids on P one-to-one correspond to the time units of I), each subset S of spin systems that are known to originate from consecutive amino acids of P is viewed as a job j s , the preference of assigning S to a subsequence P of consecutive amino acids on P is viewed as the profit of executing job j s in the subinterval of I corresponding to P, and the goal is to maximize the total profit of executing the jobs (on a single machine) during I. The interval scheduling problem is Max SNP-hard in general. Typically the jobs that require one or two consecutive time units are the most difficult to assign/schedule. To solve these most difficult assignments, we present an efficient 13/7-approximation algorithm. Combining this algorithm with a greedy filtering strategy for handling long jobs (i.e. jobs that need more than two consecutive time units), we obtained a new efficient heuristic for protein NMR peak assignment. Our study using experimental data shows that the new heuristic produces the best peak assignment in most of the cases, compared with the NMR peak assignment algorithms in the literature. The 13/7-approximation algorithm is also the first approximation algorithm for a nontrivial case of the classical (weighted) interval scheduling problem that breaks the ratio 2 barrier.

[1]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1977, Journal of molecular biology.

[2]  Zhi-Zhong Chen,et al.  An efficient branch-and-bound algorithm for the assignment of protein backbone NMR peaks , 2002, Proceedings. IEEE Computer Society Bioinformatics Conference.

[3]  Sudipto Guha,et al.  Approximating the Throughput of Multiple Machines in Real-Time Scheduling , 2002, SIAM J. Comput..

[4]  Y Xu,et al.  Protein threading using PROSPECT: Design and evaluation , 2000, Proteins.

[5]  Tao Jiang,et al.  Automated assignment of backbone NMR peaks using constrained bipartite matching , 2002, Comput. Sci. Eng..

[6]  Zhi-Zhong Chen,et al.  Improved Approximation Algorithms for NMR Spectral Peak Assignment , 2002, WABI.

[7]  Alexander Schrijver,et al.  On the Size of Systems of Sets Every t of Which Have an SDR, with an Application to the Worst-Case Ratio of Heuristics for Packing Problems , 1989, SIAM J. Discret. Math..

[8]  Reuven Bar-Yehuda,et al.  A unified approach to approximating resource allocation and scheduling , 2001, JACM.

[9]  J. Hus,et al.  Determination of protein backbone structure using only residual dipolar couplings. , 2001, Journal of the American Chemical Society.

[10]  Sudipto Guha,et al.  Approximating the throughput of multiple machines under real-time scheduling , 1999, STOC '99.

[11]  Rafael Brüschweiler,et al.  Assignment strategy for proteins with known structure. , 2002, Journal of magnetic resonance.

[12]  Zhi-Zhong Chen,et al.  Computational Assignment of Protein Backbone Nmr Peaks Byefficient Bounding and Filtering , 2003, J. Bioinform. Comput. Biol..

[13]  Rafail Ostrovsky,et al.  Approximation algorithms for the job interval selection problem and related scheduling problems , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[14]  Zhi-Zhong Chen,et al.  Approximation algorithms for NMR spectral peak assignment , 2003, Theor. Comput. Sci..