Ranked solutions to a class of combinatorial optimizations - with applications in mass spectrometry based peptide sequencing

Typical combinatorial optimizations are NP-hard; however, for a particular class of cost functions the corresponding combinatorial optimizations can be solved in polynomial time using the transfer matrix technique or, equivalently, the dynamic programming approach. This suggests a way to efficiently find approximate solutions—find a transformation that makes the cost function as similar as possible to that of the solvable class. After keeping many high-ranking solutions using the approximate cost function, one may then re-assess these solutions with the full cost function to find the best approximate solution. Under this approach, it is important to be able to assess the quality of the solutions obtained, e.g., by finding the true ranking of the kth best approximate solution when all possible solutions are considered exhaustively. To tackle this statistical issue, we provide a systematic method starting with a scaling function generated from the finite number of high-ranking solutions followed by a convergent iterative mapping. This method, useful in a variant of the directed paths in random media problem proposed here, can also provide a statistical significance assessment for one of the most important proteomic tasks—peptide sequencing using tandem mass spectrometry data. For directed paths in random media, the scaling function depends on the particular realization of randomness; in the mass spectrometry case, the scaling function is spectrum-specific.