An improved identification of proteoforms algorithm with OpenCL

With the development of the study on the electrospray ionization mass spectrometry, many new features have been added to modern protein mass spectra. To fit these new features, ProteinGoggle delivers the confident identification of proteoforms with new algorithm for in situ interpretation and database search of protein tandem mass spectra. To build the customized theoretical database of both proteins and their dissociation fragment ions, the key point is how to process the isotopic envelopes efficiently. This paper presents an improved algorithm for identification of proteoforms with the independent assortment. By means of redundancy reduction and caching with Memcached, the performance is improved dramatically and the accuracy is guaranteed as well. After that, we adopt a hybrid parallel solution with OpenCL, including optimization on independent assortment and parallel filtering, we achieve a further speedup based on the improved algorithm. The experimental results show that the multiple optimization strategy is very effective and high performance to compute high flux molecules' isotopic envelopes.