A lossless FASTQ Quality Scores file compression algorithm based on linear combination prediction

In this paper, we propose a lossless Quality Scores compression methods of FASTQ file format which is commonly used to store the NGS (Next Generation Sequencing). Instead of elaborating excellent data structure and compression technique based on the original FASTQ Quality Scores file, we try to change the distribution of original FASTQ Quality Scores file through linear combination prediction so as to make it better for further compression, using existing compression algorithms. Experimental results indicate that our compression method outperforms other four state-of-the-art compression tools and achieves up to 10% improvement in compression ratio comparing with the one in the second place and up to 35% over Gzip both on on all test datasets.