California-ND: An annotated dataset for near-duplicate detection in personal photo collections

Managing photo collections involves a variety of image quality assessment tasks, e.g. the selection of the “best” photos. Detecting near-duplicate images is a prerequisite for automating these tasks. This paper presents a new dataset that may assist researchers in testing algorithms for the detection of near-duplicates in personal photo libraries. The proposed dataset is derived directly from an actual personal travel photo collection. It contains many difficult cases and types of near-duplicates. More importantly, in order to deal with the inevitable ambiguity that the near-duplicate cases exhibit, the dataset is annotated by 10 different subjects. These annotations are combined into a non-binary ground truth, which indicates the probability that a pair of images may be considered a near-duplicate by an observer.

[1]  Andrew Zisserman,et al.  Near Duplicate Image Detection: min-Hash and tf-idf Weighting , 2008, BMVC.

[2]  Xin Yang,et al.  Near-duplicate detection for images and videos , 2009, LS-MMRM '09.

[3]  Justin Zobel,et al.  SICO: A System for Detection of Near-Duplicate Images During Search , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[4]  Shih-Fu Chang,et al.  Detection of non-identical duplicate consumer photographs , 2003, Fourth International Conference on Information, Communications and Signal Processing, 2003 and the Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003 Joint.

[5]  Zujun Hou,et al.  Keypoint-based near-duplicate images detection using affine invariant feature and color matching , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Qingming Huang,et al.  Matching Content-based Saliency Regions for partial-duplicate image retrieval , 2011, 2011 IEEE International Conference on Multimedia and Expo.

[7]  Qingming Huang,et al.  Adding Affine Invariant Geometric Constraint for Partial-Duplicate Image Retrieval , 2010, 2010 20th International Conference on Pattern Recognition.

[8]  Justin Zobel,et al.  Clustering near-duplicate images in large collections , 2007, MIR '07.

[9]  Sheng Tang,et al.  Efficient Feature Detection and Effective Post-Verification for Large Scale Near-Duplicate Image Search , 2011, IEEE Transactions on Multimedia.

[10]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[11]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[12]  Michael Isard,et al.  General Theory , 1969 .

[13]  Michael Isard,et al.  Partition Min-Hash for Partial Duplicate Image Discovery , 2010, ECCV.

[14]  Stefan Winkler,et al.  Emotion-based sequence of family photos , 2012, ACM Multimedia.

[15]  Jiwu Huang,et al.  Salient covariance for near-duplicate image and video detection , 2011, 2011 18th IEEE International Conference on Image Processing.

[16]  Jun Jie Foo,et al.  Using Redundant Bit Vectors for Near-Duplicate Image Detection , 2007, DASFAA.

[17]  Shih-Fu Chang,et al.  Detecting image near-duplicate by stochastic attributed relational graph matching with learning , 2004, MULTIMEDIA '04.

[18]  Shuicheng Yan,et al.  Near-duplicate keyframe retrieval by semi-supervised learning and nonrigid image matching , 2011, TOMCCAP.

[19]  Corinna Jacobs Interactive Panoramas: Techniques For Digital Panoramic Photography (X.Media.Publishing) , 2004 .

[20]  Michael Isard,et al.  Bundling features for large scale partial-duplicate web image search , 2009, CVPR.

[21]  Dong Xu,et al.  Near Duplicate Identification With Spatially Aligned Pyramid Matching , 2010, IEEE Transactions on Circuits and Systems for Video Technology.

[22]  Dongwon Lee,et al.  BASIL: Effective Near-Duplicate Image Detection Using Gene Sequence Alignment , 2010, ECIR.

[23]  Wei-Ta Chu,et al.  Consumer photo management and browsing facilitated by near-duplicate detection with feature filtering , 2010, J. Vis. Commun. Image Represent..

[24]  Abigail Sellen,et al.  Understanding photowork , 2006, CHI.

[25]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Justin Zobel,et al.  Discovery of Image Versions in Large Collections , 2007, MMM.

[27]  Paul Over,et al.  TREC video retrieval evaluation TRECVID , 2008 .