Biological Image Analysis via Matrix Approximation

Understanding the roles of genes and their interactions is one of the central challenges in genome research. One popular approach is based on the analysis of microarray gene expression data (Golub et al., 1999; White, et al., 1999; Oshlack et al., 2007). By their very nature, these data often do not capture spatial patterns of individual gene expressions, which is accomplished by direct visualization of the presence or absence of gene products (mRNA or protein) (e.g., Tomancak et al., 2002; Christiansen et al., 2006). For instance, the gene expression pattern images of a Drosophila melanogaster embryo capture the spatial and temporal distribution of gene expression patterns at a given developmental stage (Bownes, 1975; Tsai et al., 1998; Myasnikova et al., 2002; Harmon et al., 2007). The identification of genes showing spatial overlaps in their expression patterns is fundamentally important to formulating and testing gene interaction hypotheses (Kumar et al., 2002; Tomancak et al., 2002; Gurunathan et al., 2004; Peng & Myers, 2004; Pan et al., 2006). Recent high-throughput experiments of Drosophila have produced over fifty thousand images (http://www. fruitfly.org/cgi-bin/ex/insitu.pl). It is thus desirable to design efficient computational approaches that can automatically retrieve images with overlapping expression patterns. There are two primary ways of accomplishing this task. In one approach, gene expression patterns are described using a controlled vocabulary, and images containing overlapping patterns are found based on the similarity of textual annotations. In the second approach, the most similar expression patterns are identified by a direct comparison of image content, emulating the visual inspection carried out by biologists [(Kumar et al., 2002); see also www.flyexpress.net]. The direct comparison of image content is expected to be complementary to, and more powerful than, the controlled vocabulary approach, because it is unlikely that all attributes of an expression pattern can be completely captured via textual descriptions. Hence, to facilitate the efficient and widespread use of such datasets, there is a significant need for sophisticated, high-performance, informatics-based solutions for the analysis of large collections of biological images.

[1]  Jieping Ye,et al.  GPCA: an efficient dimension reduction scheme for image compression and retrieval , 2004, KDD.

[2]  Gordon K. Smyth,et al.  Using DNA microarrays to study gene expression in closely related species , 2007, Bioinform..

[3]  Jieping Ye,et al.  Generalized Low Rank Approximations of Matrices , 2004, Machine Learning.

[4]  M. Ashburner,et al.  Systematic determination of patterns of gene expression during Drosophila embryogenesis , 2002, Genome Biology.

[5]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[6]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[7]  M. Bownes,et al.  A photographic study of development in the living embryo of Drosophila melanogaster. , 1975, Journal of embryology and experimental morphology.

[8]  S. Panchanathan,et al.  BEST: a novel computational approach for comparing gene expression patterns from early stages of Drosophila melanogaster development. , 2002, Genetics.

[9]  Susan R. Wilson,et al.  Use of Principal Component Analysis and the GE‐Biplot for the Graphical Exploration of Gene Expression Data , 2005, Biometrics.

[10]  S. Shankar Sastry,et al.  Comparative Analysis of Spatial Patterns of Gene Expression in Drosophila melanogaster Imaginal Discs , 2007, RECOMB.

[11]  C. Tsai,et al.  Pair-rule gene runt restricts orthodenticle expression to the presumptive head of the Drosophila embryo. , 1998, Developmental genetics.

[12]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[13]  Eugene W. Myers,et al.  Comparing in situ mRNA expression patterns of drosophila embryos , 2004, RECOMB.

[14]  Scott A. Rifkin,et al.  Microarray analysis of Drosophila development during metamorphosis. , 1999, Science.

[15]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[16]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[17]  P. Deb Finite Mixture Models , 2008 .

[18]  Amir Averbuch,et al.  Image compression using wavelet transform and multiresolution decomposition , 1996, IEEE Trans. Image Process..

[19]  John Reinitz,et al.  Support vector regression applied to the determination of the developmental age of a Drosophila embryo from its segmentation gene expression patterns , 2002, ISMB.

[20]  Sethuraman Panchanathan,et al.  Identifying spatially similar gene expression patterns in early stage fruit fly embryo images: binary feature versus invariant moment digital representations , 2004, BMC Bioinformatics.

[21]  Nicholas Burton,et al.  EMAGE: a spatial database of gene expression patterns during mouse embryo development , 2005, Nucleic Acids Res..

[22]  Christos Faloutsos,et al.  Automatic mining of fruit fly embryo images , 2006, KDD '06.

[23]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[24]  Naren Ramakrishnan,et al.  Compression, clustering, and pattern discovery in very high-dimensional discrete-attribute data sets , 2005, IEEE Transactions on Knowledge and Data Engineering.