Influence of Dictionary Size on the Lossless Compression of Microarray Images

A key challenge in the management of microarray data is the large size of images that constitute the output of microarray experiments. Therefore, only the expression values extracted from these experiments are generally made available. However, the extraction of expression data is effected by a variety of factors, such as the thresholds used for background intensity correction, method used for grid determination, and parameters used in foreground (spot)-background delineation. This information is not always available or consistent across experiments and impacts downstream data analysis. Furthermore, the lack of access to the image-based primary data often leads to costly replication of experiments. Currently, both lossy and lossless compression techniques have been developed for microarray images. While lossy algorithms deliver better compression, a significant advantage of the lossless techniques is that they guarantee against loss of information that is putatively of biological importance. A key challenge therefore is the development of more efficacious lossless compression techniques. Dictionary-based compression is one of the critical methods used in lossless microarray compression. However, the image-based microarray data has potentially infinite variability. So the selection and effect of the dictionary size on the compression rate is crucial. Our paper examines this problem and shows that increasing the dictionary size beyond a certain size, does not lead to better compression. Our investigations also point to strategies for determining the optimal dictionary size.

[1]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[2]  Yu Luo,et al.  Gridding and compression of microarray images , 2004, Proceedings. 2004 IEEE Computational Systems Bioinformatics Conference, 2004. CSB 2004..

[3]  L. Penland,et al.  Use of a cDNA microarray to analyse gene expression patterns in human cancer , 1996, Nature Genetics.

[4]  Mark R. Nelson,et al.  LZW data compression , 1989 .

[5]  D. Lockhart,et al.  Expression monitoring by hybridization to high-density oligonucleotide arrays , 1996, Nature Biotechnology.

[6]  Kannan Ramchandran,et al.  Microarray image compression: SLOCO and the effect of information loss , 2003, Signal Processing.

[7]  D. Botstein,et al.  Generalized singular value decomposition for comparative analysis of genome-scale expression data sets of two different organisms , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Armando J. Pinho,et al.  On the use of standards for microarray lossless image compression , 2006, IEEE Transactions on Biomedical Engineering.

[9]  Rahul Singh,et al.  MACE: lossless compression and analysis of microarray images , 2006, SAC.

[10]  David Botstein,et al.  The Stanford Microarray Database , 2001, Nucleic Acids Res..