Adaptive parallel approximate similarity search for responsive multimedia retrieval

This paper introduces Hypercurves, a flexible framework for pro- viding similarity search indexing to high throughput multimedia services. Hypercurves efficiently and effectively answers k-nearest neighbor searches on multigigabyte high-dimensional databases. It supports massively parallel processing and adapts at runtime its parallelization regimens to keep answer times optimal for either low and high demands. In order to achieve its goals, Hypercurves introduces new techniques for selecting parallelism configurations and allocating threads to computation cores, including hyperthreaded cores. Its efficiency gains are throughly validated on a large database of multimedia descriptors, where it presented near linear speedups and superlinear scaleups. The adaptation reduces query response times in 43% and 74% for both platforms tested, when compared to the best static parallelism regimens.

[1]  David E. Culler,et al.  SEDA: an architecture for well-conditioned, scalable internet services , 2001, SOSP.

[2]  Ying Liu,et al.  A survey of content-based image retrieval with high-level semantics , 2007, Pattern Recognit..

[3]  Larry L. Peterson,et al.  A dynamic network architecture , 1992, TOCS.

[4]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS '01.

[5]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[6]  Mario A. López,et al.  High dimensional similarity search with space filling curves , 2001, Proceedings 17th International Conference on Data Engineering.

[7]  Guillaume Mercier,et al.  hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications , 2010, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing.

[8]  L. Dagum,et al.  OpenMP: an industry standard API for shared-memory programming , 1998 .

[9]  Xuhua Ding,et al.  Efficient processing of exact top-k queries over disk-resident sorted lists , 2010, The VLDB Journal.

[10]  Wagner Meira,et al.  Achieving Multi-Level Parallelism in the Filter-Labeled Stream Programming Model , 2008, 2008 37th International Conference on Parallel Processing.

[11]  Christos Faloutsos,et al.  A novel optimization approach to efficiently process aggregate similarity queries in metric access methods , 2008, CIKM '08.

[12]  Hanan Samet,et al.  Foundations of multidimensional and metric data structures , 2006, Morgan Kaufmann series in data management systems.

[13]  Pietro Liò,et al.  On low dimensional random projections and similarity search , 2008, CIKM '08.

[14]  Arthur R. Butz,et al.  Alternative Algorithm for Hilbert's Space-Filling Curve , 1971, IEEE Transactions on Computers.

[15]  Cordelia Schmid,et al.  A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Hanan Samet,et al.  Foundations of Multidimensional and Metric Data Structures (The Morgan Kaufmann Series in Computer Graphics and Geometric Modeling) , 2005 .

[17]  Matthieu Cord,et al.  High-dimensional descriptor indexing for large multimedia databases , 2008, CIKM '08.

[18]  Matthieu Cord,et al.  Indexing personal image collections: a flexible, scalable solution , 2010, IEEE Transactions on Consumer Electronics.

[19]  Michael I. Gordon,et al.  Exploiting coarse-grained task, data, and pipeline parallelism in stream programs , 2006, ASPLOS XII.

[20]  Trevor Darrell,et al.  Nearest-Neighbor Methods in Learning and Vision: Theory and Practice (Neural Information Processing) , 2006 .

[21]  Michael Wolfe,et al.  High performance compilers for parallel computing , 1995 .

[22]  Alexandros Stamatakis,et al.  Runtime scheduling of dynamic parallelism on accelerator-based multi-core systems , 2007, Parallel Comput..

[23]  Christian Böhm,et al.  Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases , 2001, CSUR.

[24]  Nimrod Megiddo,et al.  Fast indexing method for multidimensional nearest-neighbor search , 1998, Electronic Imaging.

[25]  C. L. Mallows An inequality involving multinomial probabilities , 1968 .

[26]  Alejandro Duran,et al.  Runtime Adjustment of Parallel Nested Loops , 2004, WOMPAT.

[27]  Matthieu Cord,et al.  Fast identification of visual documents using local descriptors , 2008, DocEng '08.

[28]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[29]  Santosh Pande,et al.  Input-driven dynamic execution prediction of streaming applications , 2010, PPoPP '10.

[30]  Ümit V. Çatalyürek,et al.  Run-time optimizations for replicated dataflows on heterogeneous environments , 2010, HPDC '10.

[31]  A. Snavely,et al.  Symbiotic jobscheduling for a simultaneous mutlithreading processor , 2000, SIGP.

[32]  Pen-Chung Yew,et al.  The impact of synchronization and granularity on parallel systems , 1990, ISCA '90.

[33]  Ronald Fagin,et al.  Efficient similarity search and classification via rank aggregation , 2003, SIGMOD '03.

[34]  Hans-Jörg Schek,et al.  A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces , 1998, VLDB.

[35]  Jean Ponce,et al.  Learning mid-level features for recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[36]  Christos Doulkeridis,et al.  On the selectivity of multidimensional routing indices , 2010, CIKM '10.