Compression of LC/MS proteomic data

Summary form only given. The unrelenting growth of liquid chromatography-mass spectrometry (LC-MS) based proteomic data to gigabytes per sample and terabytes per experiment motivates this investigation into compression methods suited to MS signal sources. Compression is needed to facilitate storage, searching, archiving, retrieval, and communication of proteomic MS data. We demonstrate compression techniques that reduce the average file size by a factor of 25 without any loss of accuracy. We have designed two main methods to code the MS data. The first method predicts the mass-to-charge ratio based on the intensity values and encodes the residual with bzip2. The second algorithm maps the original intensity values onto a universal grid and either directly encodes them with bzip2 or applies an arithmetic coder to the results of run-length coding. The latter method achieves the highest compression ratios