An identification framework for print-scan books in a large database

In this paper, we propose an identification framework to determine copyright infringement in the form of illegally distributed print-scan books in a large database. The framework contains following main stages: image pre-processing, feature vector extraction, clustering, and indexing, and hierarchical search. The image pre-processing stage provides methods for alleviating the distortions induced by a scanner or digital camera. From the pre-processed image, we propose to generate feature vectors that are robust against distortion. To enhance the clustering performance in a large database, we use a clustering method based on the parallel-distributed computing of Hadoop MapReduce. In addition, to store the clustered feature vectors efficiently and minimize the searching time, we investigate an inverted index for feature vectors. Finally, we implement a two-step hierarchical search to achieve fast and accurate on-line identification. In a simulation, the proposed identification framework shows accurate and robust in the presence of print-scan distortions. The processing time analysis in a parallel computing environment gives extensibility of the proposed framework to massive data. In the matching performance analysis, we empirically and theoretically find that in terms of query time, the optimal number of clusters scales with O(N) for N print-scan books.

[1]  Qi Tian,et al.  BSIFT: Toward Data-Independent Codebook for Large Scale Image Search , 2015, IEEE Transactions on Image Processing.

[2]  Benjamin Moseley,et al.  Fast clustering using MapReduce , 2011, KDD.

[3]  Shekhar Verma,et al.  Watermark based digital rights management , 2005, International Conference on Information Technology: Coding and Computing (ITCC'05) - Volume II.

[4]  Philip Ogunbona,et al.  Compression tolerant DCT based image hash , 2003, 23rd International Conference on Distributed Computing Systems Workshops, 2003. Proceedings..

[5]  Arambam Neelima,et al.  A Short Survey on Perceptual Hash Function , 2014 .

[6]  Xin Liu,et al.  Fast image clustering based on convolutional neural network and binary K-means , 2016, International Conference on Digital Image Processing.

[7]  Weixin Xie,et al.  An Efficient Global K-means Clustering Algorithm , 2011, J. Comput..

[8]  Ramarathnam Venkatesan,et al.  Robust image hashing , 2000, Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101).

[9]  Sanghoon Lee,et al.  Fully Deep Blind Image Quality Predictor , 2017, IEEE Journal of Selected Topics in Signal Processing.

[10]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[11]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[12]  Chun-Shien Lu,et al.  Structural digital signature for image authentication: an incidental distortion resistant scheme , 2003, IEEE Trans. Multim..

[13]  Rafael C. González,et al.  Digital image processing, 3rd Edition , 2008 .

[14]  Kwanghyun Lee,et al.  A New Framework for Measuring 2D and 3D Visual Information in Terms of Entropy , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[15]  Venu Govindaraju,et al.  Document image analysis: A primer , 2002 .

[16]  Qi Tian,et al.  Cross-Indexing of Binary SIFT Codes for Large-Scale Image Search , 2014, IEEE Transactions on Image Processing.

[17]  Alan C. Bovik,et al.  3D Visual Activity Assessment Based on Natural Scene Statistics , 2014, IEEE Transactions on Image Processing.

[18]  Richard O. Duda,et al.  Use of the Hough transformation to detect lines and curves in pictures , 1972, CACM.

[19]  Qi Tian,et al.  Scalable Feature Matching by Dual Cascaded Scalar Quantization for Image Retrieval , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Sanghoon Lee,et al.  Blind Sharpness Prediction Based on Image-Based Motion Blur Analysis , 2015, IEEE Transactions on Broadcasting.

[21]  William Nick Street,et al.  Cluster-driven refinement for content-based digital image retrieval , 2004, IEEE Transactions on Multimedia.

[22]  Ramarathnam Venkatesan,et al.  Robust perceptual image hashing via matrix invariants , 2004, 2004 International Conference on Image Processing, 2004. ICIP '04..

[23]  Alan C. Bovik,et al.  No-Reference Sharpness Assessment of Camera-Shaken Images by Analysis of Spectral Structure , 2014, IEEE Transactions on Image Processing.

[24]  Wonyoung Yoo,et al.  Robust video fingerprinting based on hierarchical symmetric difference feature , 2011, CIKM '11.

[25]  Hae-Sang Park,et al.  A simple and fast algorithm for K-medoids clustering , 2009, Expert Syst. Appl..

[26]  V. Monga,et al.  Clustering algorithms for perceptual image hashing , 2004, 3rd IEEE Signal Processing Education Workshop. 2004 IEEE 11th Digital Signal Processing Workshop, 2004..

[27]  Michael Isard,et al.  Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[28]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[29]  Jiwu Huang,et al.  Histogram-based image hashing scheme robust against geometric deformations , 2007, MM&Sec.

[30]  Zhongfei Zhang,et al.  A clustering based approach to efficient image retrieval , 2002, 14th IEEE International Conference on Tools with Artificial Intelligence, 2002. (ICTAI 2002). Proceedings..

[31]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Sanghoon Lee,et al.  Transition of Visual Attention Assessment in Stereoscopic Images With Evaluation of Subjective Visual Quality and Discomfort , 2015, IEEE Transactions on Multimedia.

[33]  Chang Dong Yoo,et al.  Robust video fingerprinting for content-based video identification , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[34]  Xudong Jiang,et al.  Efficient fingerprint search based on database clustering , 2007, Pattern Recognit..