Searching Complex Data Without an Index

We show how query-specific content-based computation can be used for interactive search when a pre-computed index is not available. Rather than text or numeric data, we focus on complex data such as digital photographs and medical images.  We describe a system that can perform such interactive searches on stored data as well as live Web data.  The system is able to narrow the focus of a non-indexed search by using structured data sources such as relational databases.  It can also leverage domain-specific software tools in search computations.  We report on the design and implementation of this system, and its use in the health sciences.

[1]  Rong Jin,et al.  Learning distance metrics for interactive search-assisted diagnosis of mammograms , 2007, SPIE Medical Imaging.

[2]  Donald D. Chamberlin,et al.  Access Path Selection in a Relational Database Management System , 1989 .

[3]  David A. Patterson,et al.  A case for intelligent disks (IDISKs) , 1998, SGMD.

[4]  Yan Ke,et al.  Efficient Near-duplicate Detection and Sub-image Retrieval , 2004 .

[5]  Gregory R. Ganger,et al.  Dynamic Function Placement for Data-Intensive Cluster Computing , 2000, USENIX Annual Technical Conference, General Track.

[6]  Galen C. Hunt,et al.  The Coign automatic distributed partitioning system , 1999, OSDI '99.

[7]  Rong Jin,et al.  A Boosting Framework for Visuality-Preserving Distance Metric Learning and Its Application to Medical Image Retrieval , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Mahmut T. Kandemir,et al.  Design and evaluation of smart disk architecture for DSS commercial workloads , 2000, Proceedings 2000 International Conference on Parallel Processing.

[9]  Andrew Chi-Chih Yao,et al.  A general approach to d-dimensional geometric queries , 1985, STOC '85.

[10]  Noah Treuhaft,et al.  Cluster I/O with River: making the fast case common , 1999, IOPADS '99.

[11]  Rosalind W. Picard,et al.  Interactive Learning Using a "Society of Models" , 2017, CVPR 1996.

[12]  Christian Böhm,et al.  A cost model for nearest neighbor search in high-dimensional data space , 1997, PODS.

[13]  George C. Necula,et al.  Safe kernel extensions without run-time checking , 1996, OSDI '96.

[14]  Dragutin Petkovic,et al.  Query by Image and Video Content: The QBIC System , 1995, Computer.

[15]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[16]  Dan S. Wallach,et al.  Extensible security architectures for Java , 1997, SOSP.

[17]  Joseph M. Hellerstein,et al.  Eddies: continuously adaptive query processing , 2000, SIGMOD '00.

[18]  Laura A. Dabbish,et al.  Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.

[19]  Robert Wahbe,et al.  Efficient software-based fault isolation , 1994, SOSP '93.

[20]  Mei Chen,et al.  Distributed online anomaly detection in high-content screening , 2008, 2008 5th IEEE International Symposium on Biomedical Imaging: From Nano to Macro.

[21]  Mahadev Satyanarayanan,et al.  Diamond: A Storage Architecture for Early Discard in Interactive Search , 2004, FAST.

[22]  Derek Hoiem,et al.  SnapFind: brute force interactive image retrieval , 2004, Third International Conference on Image and Graphics (ICIG'04).

[23]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[24]  Christos Faloutsos,et al.  Active Storage for Large-Scale Data Mining and Multimedia , 1998, VLDB.

[25]  Hai Jin,et al.  Active Disks: Programming Model, Algorithms and Evaluation , 2002 .

[26]  Mei Chen,et al.  Interactive Search of Adipocytes in Large Collections of Digital Cellular Images , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[27]  Euijin Kim,et al.  Fast and Robust Ellipse Extraction from Complicated Images , 2002 .