Batch Nearest Neighbor Search for Video Retrieval

To retrieve similar videos to a query clip from a large database, each video is often represented by a sequence of high- dimensional feature vectors. Typically, given a query video containing m feature vectors, an independent nearest neighbor (NN) search for each feature vector is often first performed. After completing all the NN searches, an overall similarity is then computed, i.e., a single content-based video retrieval usually involves m individual NN searches. Since normally nearby feature vectors in a video are similar, a large number of expensive random disk accesses are expected to repeatedly occur, which crucially affects the overall query performance. Batch nearest neighbor (BNN) search is stated as a batch operation that performs a number of individual NN searches. This paper presents a novel approach towards efficient high-dimensional BNN search called dynamic query ordering (DQO) for advanced optimizations of both I/O and CPU costs. Observing the overlapped candidates (or search space) of a pervious query may help to further reduce the candidate sets of subsequent queries, DQO aims at progressively finding a query order such that the common candidates among queries are fully utilized to maximally reduce the total number of candidates. Modelling the candidate set relationship of queries by a candidate overlapping graph (COG), DQO iteratively selects the next query to be executed based on its estimated pruning power to the rest of queries with the dynamically updated COG. Extensive experiments are conducted on real video datasets and show the significance of our BNN query processing strategy.

[1]  Christos Faloutsos,et al.  Similarity search without tears: the OMNI-family of all-purpose access methods , 2001, Proceedings 17th International Conference on Data Engineering.

[2]  Thomas S. Huang,et al.  Relevance feedback: a power tool for interactive content-based image retrieval , 1998, IEEE Trans. Circuits Syst. Video Technol..

[3]  Chin-Wan Chung,et al.  The GC-tree: a high-dimensional index structure for similarity search in image databases , 2002, IEEE Trans. Multim..

[4]  Xiaoming Zhu,et al.  An efficient indexing method for nearest neighbor searches in high-dirnensional image databases , 2002, IEEE Trans. Multim..

[5]  Kyriakos Mouratidis,et al.  Aggregate nearest neighbor queries in spatial databases , 2005, TODS.

[6]  Yufei Tao,et al.  Continuous Nearest Neighbor Search , 2002, VLDB.

[7]  Christian Böhm,et al.  A cost model for query processing in high dimensional data spaces , 2000, TODS.

[8]  Kyriakos Mouratidis,et al.  Group nearest neighbor queries , 2004, Proceedings. 20th International Conference on Data Engineering.

[9]  Yufei Tao,et al.  All-nearest-neighbors queries in spatial databases , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[10]  KriegelHans-Peter,et al.  Epsilon grid order , 2001 .

[11]  Beng Chin Ooi,et al.  Hierarchical Indexing Structure for Efficient Similarity Search in Video Retrieval , 2006, IEEE Transactions on Knowledge and Data Engineering.

[12]  Hans-Peter Kriegel,et al.  Efficiently supporting multiple similarity queries for mining in metric databases , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[13]  Timos K. Sellis,et al.  Improvements on a Heuristic Algorithm for Multiple-Query Optimization , 1994, Data Knowl. Eng..

[14]  BöhmChristian A cost model for query processing in high dimensional data spaces , 2000 .

[15]  Beng Chin Ooi,et al.  SaveRF: Towards Efficient Relevance Feedback Search , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[16]  Lien Fa Lin,et al.  Continuous nearest neighbor search , 2008 .

[17]  Hans-Peter Kriegel,et al.  The X-tree : An Index Structure for High-Dimensional Data , 2001, VLDB.

[18]  Guojun Lu,et al.  Techniques and data structures for efficient multimedia retrieval based on similarity , 2002, IEEE Trans. Multim..

[19]  Christian Böhm,et al.  Independent quantization: an index compression technique for high-dimensional data spaces , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[20]  Bharat K. Bhargava,et al.  Multiple-Query Optimization at Algorithm-Level , 1994, Data Knowl. Eng..

[21]  Beng Chin Ooi,et al.  Towards effective indexing for very large video sequence database , 2005, SIGMOD '05.

[22]  Christian Böhm,et al.  Epsilon grid order: an algorithm for the similarity join on massive high-dimensional data , 2001, SIGMOD '01.

[23]  Anthony K. H. Tung,et al.  LDC: enabling search by partial distance in a hyper-dimensional space , 2004, Proceedings. 20th International Conference on Data Engineering.

[24]  Divyakant Agrawal,et al.  High dimensional nearest neighbor searching , 2006, Inf. Syst..

[25]  Beng Chin Ooi,et al.  iDistance: An adaptive B+-tree based indexing method for nearest neighbor search , 2005, TODS.

[26]  FaginRonald Combining fuzzy information , 2002 .

[27]  Hans-Jörg Schek,et al.  A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces , 1998, VLDB.

[28]  Zi Huang,et al.  Dynamic Batch Nearest Neighbor Search in Video Retrieval , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[29]  Xuelong Li,et al.  Direct kernel biased discriminant analysis: a new content-based image retrieval relevance feedback algorithm , 2006, IEEE Transactions on Multimedia.

[30]  Avideh Zakhor,et al.  Efficient video similarity measurement with video signature , 2002, Proceedings. International Conference on Image Processing.

[31]  Timos K. Sellis,et al.  Multiple-query optimization , 1988, TODS.

[32]  Xuelong Li,et al.  Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Avideh Zakhor,et al.  Efficient video similarity measurement with video signature , 2003, IEEE Trans. Circuits Syst. Video Technol..

[34]  Beng Chin Ooi,et al.  Gorder: An Efficient Method for KNN Join Processing , 2004, VLDB.

[35]  Pavel Zezula,et al.  M-tree: An Efficient Access Method for Similarity Search in Metric Spaces , 1997, VLDB.

[36]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[37]  Ronald Fagin,et al.  Combining fuzzy information: an overview , 2002, SGMD.

[38]  Wei Tsang Ooi,et al.  Hierarchical, non-uniform locality sensitive hashing and its application to video identification , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[39]  Sunil Arya,et al.  An optimal algorithm for approximate nearest neighbor searching fixed dimensions , 1998, JACM.

[40]  Shih-Fu Chang,et al.  Survey of compressed-domain features used in audio-visual indexing and analysis , 2003, J. Vis. Commun. Image Represent..

[41]  S. Muthukrishnan,et al.  Influence sets based on reverse nearest neighbor queries , 2000, SIGMOD '00.

[42]  Moncef Gabbouj,et al.  Hierarchical Cellular Tree: An Efficient Indexing Scheme for Content-Based Retrieval on Multimedia Databases , 2007, IEEE Transactions on Multimedia.

[43]  Prasan Roy,et al.  Efficient and extensible algorithms for multi query optimization , 1999, SIGMOD '00.