The Parallel Distributed Image Search Engine (ParaDISE)

Image retrieval is a complex task that differs according to the context and the user requirements in any specific field, for example in a medical environment. Search by text is often not possible or optimal and retrieval by the visual content does not always succeed in modelling high-level concepts that a user is looking for. Modern image retrieval techniques consists of multiple steps and aim to retrieve information from large–scale datasets and not only based on global image appearance but local features and if possible in a connection between visual features and text or semantics. This paper presents the Parallel Distributed Image Search Engine (ParaDISE), an image retrieval system that combines visual search with text–based retrieval and that is available as open source and free of charge. The main design concepts of ParaDISE are flexibility, expandability, scalability and interoperability. These concepts constitute the system, able to be used both in real–world applications and as an image retrieval research platform. Apart from the architecture and the implementation of the system, two use cases are described, an application of ParaDISE in retrieval of images from the medical literature and a visual feature evaluation for medical image retrieval. Future steps include the creation of an open source community that will contribute and expand this platform based on the existing parts.

[1]  Henning Müller,et al.  Using MapReduce for Large-Scale Medical Image Analysis , 2012, 2012 IEEE Second International Conference on Healthcare Informatics, Imaging and Systems Biology.

[2]  Adam Rifkin,et al.  Nutch: A Flexible and Scalable Open-Source Web Search Engine , 2005 .

[3]  M. Markkula,et al.  Searching for Photos - Journalists' Practices in Pictorial IR , 1998 .

[4]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Henning Müller,et al.  Bag-of-Colors for Biomedical Document Image Classification , 2012, MCBR-CDS.

[6]  Marco Pastori,et al.  Information mining in remote sensing image archives: system concepts , 2003, IEEE Trans. Geosci. Remote. Sens..

[7]  Andrew Zisserman,et al.  Three things everyone should know to improve object retrieval , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[9]  Hideyuki Tamura,et al.  Textural Features Corresponding to Visual Perception , 1978, IEEE Transactions on Systems, Man, and Cybernetics.

[10]  Gustavo Carneiro,et al.  The distinctiveness, detectability, and robustness of local image features , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[11]  Pietro Perona,et al.  Indexing in large scale image collections: Scaling properties and benchmark , 2011, 2011 IEEE Workshop on Applications of Computer Vision (WACV).

[12]  Laurent Amsaleg,et al.  Scalable high-dimensional indexing with Hadoop , 2013, 2013 11th International Workshop on Content-Based Multimedia Indexing (CBMI).

[13]  Matthijs Douze,et al.  Bag-of-colors for improved image search , 2011, ACM Multimedia.

[14]  Akio Yamada,et al.  The MPEG-7 color layout descriptor: a compact image feature description for high-speed image/video segment retrieval , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[15]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[16]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[18]  Kai-Kuang Ma,et al.  Fuzzy color histogram and its use in color image retrieval , 2002, IEEE Trans. Image Process..

[19]  Dimitri Van De Ville,et al.  Multiscale Lung Texture Signature Learning Using the Riesz Transform , 2012, MICCAI.

[20]  Thierry Pun,et al.  Efficient access methods for content-based image retrieval with inverted files , 1999, Optics East.

[21]  Thierry Pun,et al.  Design and evaluation of a content-based image retrieval system , 2001 .

[22]  Jing Zhang,et al.  DIRS: Distributed image retrieval system based on MapReduce , 2010, 5th International Conference on Pervasive Computing and Applications.

[23]  Mathias Lux,et al.  Lire: lucene image retrieval: an extensible java CBIR library , 2008, ACM Multimedia.

[24]  Thomas M Deserno,et al.  Content-based image retrieval applied to BI-RADS tissue classification in screening mammography. , 2011, World journal of radiology.

[25]  Alexandr Andoni,et al.  Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[26]  H Müller,et al.  A Survey on Visual Information Search Behavior and Requirements of Radiologists , 2012, Methods of Information in Medicine.

[27]  Frédéric Jurie,et al.  Finding Groups of Duplicate Images In Very Large Dataset , 2012, BMVC.

[28]  Hermann Ney,et al.  Features for image retrieval: an experimental comparison , 2008, Information Retrieval.

[29]  Henning Müller,et al.  Separating compound figures in journal articles to allow for subfigure classification , 2013, Medical Imaging.

[30]  Sei-ichiro Kamata,et al.  NIR: Content based image retrieval on cloud computing , 2009, 2009 IEEE International Conference on Intelligent Computing and Intelligent Systems.

[31]  Mohan S. Kankanhalli,et al.  Multimodal fusion for multimedia analysis: a survey , 2010, Multimedia Systems.

[32]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[33]  Yiannis S. Boutalis,et al.  CEDD: Color and Edge Directivity Descriptor: A Compact Descriptor for Image Indexing and Retrieval , 2008, ICVS.

[34]  Chong-Wah Ngo,et al.  Evaluating bag-of-visual-words representations in scene classification , 2007, MIR '07.

[35]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[36]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[37]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[38]  Hermann Ney,et al.  Features for Image Retrieval: A Quantitative Comparison , 2004, DAGM-Symposium.

[39]  Norbert Fuhr,et al.  ezDL: An Interactive IR Framework, Search Tool, and Evaluation System , 2014, Professional Search in the Modern World.

[40]  Horst Bischof,et al.  Evaluation of Fast 2D and 3D Medical Image Retrieval Approaches Based on Image Miniatures , 2011, MCBR-CDS.

[41]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[42]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[43]  Yon Dohn Chung,et al.  Parallel data processing with MapReduce: a survey , 2012, SGMD.

[44]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[45]  Yiannis S. Boutalis,et al.  FCTH: Fuzzy Color and Texture Histogram - A Low Level Feature for Accurate Image Retrieval , 2008, 2008 Ninth International Workshop on Image Analysis for Multimedia Interactive Services.