The amount of biological data publicly available has experienced an exponential growth as the technology advances. Online databases are now playing an important role as information repositories as well as easily accessible platforms for researchers to communicate and contribute. Recent research projects in image bioinformatics produce a number of databases of images, which visualize the spatial expression pattern of a gene (eg. "fj"), and most of which also have one or several annotation keywords (eg., "embryonic hindgut").
C-DEM is an online system for Drosophila (= fruit-fly) Embryo images Mining. It supports queries from all three modalities to all three, namely, (a) genes, (b) images of gene expression, and (c) annotation keywords of the images. Thus, it can find images that are similar to a given image, and/or related to the desirable annotation keywords, and/or related to specific genes. Typical queries are what are most suitable keywords to assign to image insitu28465.jpg or find images that are related to gene "fj", and to the keyword "embryonic hindgut". C-DEM uses state-of-the-art feature extraction methods for images (wavelets and principal component analysis). It envisions the whole database as a tri-partite graph (one type for each modality), and it uses fast and flexible proximity measures, namely, random walk with restarts (RWR).
In addition to flexible querying, C-DEM allows for navigation: the user can click on the results of an earlier query (image thumbnails and/or keywords and/or genes), and the system will report the most related images (and keywords, and genes). The demo is on a real Drosophila Embryo database, with 10,204 images, 2,969 distinct genes, and 113 annotation keywords. The query response time is below one second on a commodity desktop.
[1]
Heikki Mannila,et al.
Relational link-based ranking
,
2004,
VLDB.
[2]
George Karypis,et al.
A Software Package for Partitioning Unstructured Graphs , Partitioning Meshes , and Computing Fill-Reducing Orderings of Sparse Matrices Version 5 . 0
,
1998
.
[3]
Christos Faloutsos,et al.
Center-piece subgraphs: problem definition and fast solutions
,
2006,
KDD '06.
[4]
Christos Faloutsos,et al.
Fast Random Walk with Restart and Its Applications
,
2006,
Sixth International Conference on Data Mining (ICDM'06).
[5]
Soumen Chakrabarti,et al.
User Interaction in the BANKS System.
,
2003,
ICDE 2003.
[6]
Vagelis Hristidis,et al.
ObjectRank: Authority-Based Keyword Search in Databases
,
2004,
VLDB.
[7]
David L. Wheeler,et al.
GenBank
,
2015,
Nucleic Acids Res..
[8]
Christos Faloutsos,et al.
Automatic mining of fruit fly embryo images
,
2006,
KDD '06.
[9]
K. Wakimoto,et al.
Efficient and Effective Querying by Image Content
,
1994
.
[10]
Michael Y. Galperin.
The Molecular Biology Database Collection: 2007 update
,
2006,
Nucleic Acids Res..
[11]
Michael Y. Galperin.
The Molecular Biology Database Collection: 2005 update
,
2004,
Nucleic Acids Res..
[12]
David M. Shotton,et al.
FlyTED: the Drosophila Testis Gene Expression Database
,
2009,
Nucleic Acids Res..
[13]
Christos Faloutsos,et al.
Automatic multimedia cross-modal correlation discovery
,
2004,
KDD.