Development of Lossless Compression Techniques for Biology Information and Its Application for Bioinformatics Database Retrieval

We are developing lossless encoding techniques for multimedia data such as sounds [1], images and videos. This presentation shows an integrated lossless encoder software “G-encoder”, which supports several biology information formats such as DNA sequences, amino-acid sequences, protein 3-D structures and microarray analysis images. We found our proposed compression tool gave higher compression ratio than the conventional universal lossless compression tools such as GZIP and also the other several reported works [2, 4]. In this paper, we describe an abstract of our proposed encoding algorithm and experimental results.

[1]  Khalid Sayood Lossless Compression Handbook , 2003 .

[2]  Hugh E. Williams,et al.  Compression of nucleotide databases for fast searching , 1997, Comput. Appl. Biosci..

[3]  Toshiko Matsumoto,et al.  Biological sequence compression algorithms. , 2000, Genome informatics. Workshop on Genome Informatics.