A comparison of local features for camera-based document image retrieval and spotting

This paper aims at comparing robustness of local features for camera-based document image retrieval and spotting system. We present a literature review of the state of the art of local features extraction that includes keypoint detectors and keypoint descriptors. We also present a dataset and evaluation protocol for camera-based document image retrieval and spotting systems. This dataset is composed of three subparts: The first dataset represents the images with textual content only; the second dataset represents images with graphical content mainly; the third dataset contains text plus graphical elements. Along with the datasets, we present the protocol that describes measurements to evaluate the accuracy and processing time of camera-based document image retrieval and spotting systems. The latter is employed for presenting a detailed evaluation of local features from the literature.

[1]  Hans P. Moravec Towards Automatic Visual Obstacle Avoidance , 1977, IJCAI.

[2]  Tom Drummond,et al.  Machine Learning for High-Speed Corner Detection , 2006, ECCV.

[3]  Mickaël Coustaty,et al.  SRIF: Scale and Rotation Invariant Features for camera-based document image retrieval , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[4]  Masakazu Iwamura,et al.  Use of Affine Invariants in Locally Likely Arrangement Hashing for Camera-Based Document Image Retrieval , 2006, Document Analysis Systems.

[5]  H. Pedrini,et al.  Dimensionality reduction through PCA over SIFT and SURF descriptors , 2013, 2012 IEEE 11th International Conference on Cybernetic Intelligent Systems (CIS).

[6]  Xu Liu,et al.  Mobile Retriever-Finding Document with a Snapshot , 2007 .

[7]  David G. Lowe,et al.  Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[8]  Lynn Wilcox,et al.  High accuracy and language independent document retrieval with a Fast Invariant Transform , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[9]  Zhe Wang,et al.  Multi-Probe LSH: Efficient Indexing for High-Dimensional Similarity Search , 2007, VLDB.

[10]  Angel Domingo Sappa,et al.  Feature Point Descriptors: Infrared and Visible Spectra , 2014, Sensors.

[11]  Sos S. Agaian,et al.  Human Visual System-Based Image Enhancement and Logarithmic Contrast Measure , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[12]  Zhuowen Tu,et al.  Auto-context and its application to high-level vision tasks , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[14]  Jing Li,et al.  A comprehensive review of current local features for computer vision , 2008, Neurocomputing.

[15]  Vincent Lepetit,et al.  BRIEF: Computing a Local Binary Descriptor Very Fast , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[17]  Darius Burschka,et al.  Adaptive and Generic Corner Detection Based on the Accelerated Segment Test , 2010, ECCV.

[18]  Josep Lladós,et al.  Spotting Graphical Symbols in Camera-Acquired Documents in Real Time , 2013, GREC.

[19]  Jay Hegdé,et al.  Semantic descriptor ranking: a quantitative method for evaluating qualitative verbal reports of visual cognition in the laboratory or the clinic , 2014, Front. Psychol..

[20]  Shinichiro Omachi,et al.  Expansion of queries and databases for improving the retrieval accuracy of document portions: an application to a camera-pen system , 2010, DAS '10.

[21]  Masakazu Iwamura,et al.  Improvement of Retrieval Speed and Required Amount of Memory for Geometric Hashing by Combining Local Invariants , 2007, BMVC.

[22]  Adrien Bartoli,et al.  Fast Explicit Diffusion for Accelerated Features in Nonlinear Scale Spaces , 2013, BMVC.

[23]  Wen Gao,et al.  Effective and efficient object-based image retrieval using visual phrases , 2006, MM '06.

[24]  Mickaël Coustaty,et al.  New spatial-organization-based scale and rotation invariant features for heterogeneous-content camera-based document image retrieval , 2018, Pattern Recognit. Lett..

[25]  A. Lawrence Spitz,et al.  Determination of the Script and Language Content of Document Images , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  Andrew W. Fitzgibbon,et al.  A Buyer's Guide to Conic Fitting , 1995, BMVC.

[27]  Masakazu Iwamura,et al.  Real-Time Retrieval for Images of Documents in Various Languages Using a Web Camera , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[28]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[29]  Vincent Lepetit,et al.  Boosting Binary Keypoint Descriptors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Vincent Lepetit,et al.  BRIEF: Binary Robust Independent Elementary Features , 2010, ECCV.

[31]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[32]  Kurt Konolige,et al.  CenSurE: Center Surround Extremas for Realtime Feature Detection and Matching , 2008, ECCV.

[33]  Roland Siegwart,et al.  BRISK: Binary Robust invariant scalable keypoints , 2011, 2011 International Conference on Computer Vision.

[34]  Shijian Lu,et al.  Script and Language Identification in Noisy and Degraded Document Images , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Chris Murphy,et al.  Local Label Descriptor for Example Based Semantic Image Labeling , 2012, ECCV.

[36]  Shijian Lu,et al.  Document Image Retrieval through Word Shape Coding , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  S. Lu,et al.  Keyword Spotting and Retrieval of Document Images Captured by a Digital Camera , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[38]  Hans P. Morevec Towards automatic visual obstacle avoidance , 1977, IJCAI 1977.

[39]  Jean-Michel Morel,et al.  From Gestalt Theory to Image Analysis: A Probabilistic Approach , 2007 .

[40]  Masakazu Iwamura,et al.  Real-Time Document Image Retrieval for a 10 Million Pages Database with a Memory Efficient and Stability Improved LLAH , 2011, 2011 International Conference on Document Analysis and Recognition.

[41]  Masakazu Iwamura,et al.  Camera Based Document Image Retrieval with More Time and Memory Efficient LLAH , 2008 .

[42]  Shijian Lu,et al.  Retrieval of machine-printed Latin documents through Word Shape Coding , 2008, Pattern Recognit..

[43]  Christoph H. Lampert,et al.  Learning to Localize Objects with Structured Output Regression , 2008, ECCV.

[44]  Sebastian Nowozin,et al.  Structured Learning and Prediction in Computer Vision , 2011, Found. Trends Comput. Graph. Vis..

[45]  Paul L. Rosin Measuring Corner Properties , 1999, Comput. Vis. Image Underst..

[46]  David S. Doermann,et al.  Camera-based analysis of text and documents: a survey , 2005, International Journal of Document Analysis and Recognition (IJDAR).

[47]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[48]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[49]  Fei-Fei Li,et al.  Deep visual-semantic alignments for generating image descriptions , 2015, CVPR.

[50]  Bin Fan,et al.  Local Image Descriptor: Modern Approaches , 2015, SpringerBriefs in Computer Science.

[51]  Azeddine Beghdadi,et al.  A survey of perceptual image processing methods , 2013, Signal Process. Image Commun..

[52]  Alan F. Smeaton,et al.  Using character shape coding for information retrieval , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[53]  Masakazu Iwamura,et al.  Real-Time Document Image Retrieval on a Smartphone , 2012, 2012 10th IAPR International Workshop on Document Analysis Systems.

[54]  T. Nakai,et al.  Hashing with Local Combinations of Feature Points and Its Application to Camera-Based Document Image Retrieval — Retrieval in 0 . 14 Second from 10 , 000 Pages — , 2005 .

[55]  Peter Kontschieder,et al.  Structured class-labels in random forests for semantic image labelling , 2011, 2011 International Conference on Computer Vision.

[56]  Berna Erol,et al.  Paper-Based Augmented Reality , 2007 .

[57]  S. M. Steve SUSAN - a new approach to low level image processing , 1997 .

[58]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[59]  Adrien Bartoli,et al.  KAZE Features , 2012, ECCV.

[60]  Pierre Vandergheynst,et al.  FREAK: Fast Retina Keypoint , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.