论文信息 - A comparison of local features for camera-based document image retrieval and spotting

A comparison of local features for camera-based document image retrieval and spotting

This paper aims at comparing robustness of local features for camera-based document image retrieval and spotting system. We present a literature review of the state of the art of local features extraction that includes keypoint detectors and keypoint descriptors. We also present a dataset and evaluation protocol for camera-based document image retrieval and spotting systems. This dataset is composed of three subparts: The first dataset represents the images with textual content only; the second dataset represents images with graphical content mainly; the third dataset contains text plus graphical elements. Along with the datasets, we present the protocol that describes measurements to evaluate the accuracy and processing time of camera-based document image retrieval and spotting systems. The latter is employed for presenting a detailed evaluation of local features from the literature.

[1] Hans P. Moravec. Towards Automatic Visual Obstacle Avoidance , 1977, IJCAI.

[2] Tom Drummond,et al. Machine Learning for High-Speed Corner Detection , 2006, ECCV.

[3] Mickaël Coustaty,et al. SRIF: Scale and Rotation Invariant Features for camera-based document image retrieval , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[4] Masakazu Iwamura,et al. Use of Affine Invariants in Locally Likely Arrangement Hashing for Camera-Based Document Image Retrieval , 2006, Document Analysis Systems.

[5] H. Pedrini,et al. Dimensionality reduction through PCA over SIFT and SURF descriptors , 2013, 2012 IEEE 11th International Conference on Cybernetic Intelligent Systems (CIS).

[6] Xu Liu,et al. Mobile Retriever-Finding Document with a Snapshot , 2007 .

[7] David G. Lowe,et al. Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[8] Lynn Wilcox,et al. High accuracy and language independent document retrieval with a Fast Invariant Transform , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[9] Zhe Wang,et al. Multi-Probe LSH: Efficient Indexing for High-Dimensional Similarity Search , 2007, VLDB.

[10] Angel Domingo Sappa,et al. Feature Point Descriptors: Infrared and Visible Spectra , 2014, Sensors.

[11] Sos S. Agaian,et al. Human Visual System-Based Image Enhancement and Logarithmic Contrast Measure , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[12] Zhuowen Tu,et al. Auto-context and its application to high-level vision tasks , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[13] Robert C. Bolles,et al. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[14] Jing Li,et al. A comprehensive review of current local features for computer vision , 2008, Neurocomputing.

[15] Vincent Lepetit,et al. BRIEF: Computing a Local Binary Descriptor Very Fast , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16] Matthijs C. Dorst. Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[17] Darius Burschka,et al. Adaptive and Generic Corner Detection Based on the Accelerated Segment Test , 2010, ECCV.

[18] Josep Lladós,et al. Spotting Graphical Symbols in Camera-Acquired Documents in Real Time , 2013, GREC.

[19] Jay Hegdé,et al. Semantic descriptor ranking: a quantitative method for evaluating qualitative verbal reports of visual cognition in the laboratory or the clinic , 2014, Front. Psychol..

[20] Shinichiro Omachi,et al. Expansion of queries and databases for improving the retrieval accuracy of document portions: an application to a camera-pen system , 2010, DAS '10.

[21] Masakazu Iwamura,et al. Improvement of Retrieval Speed and Required Amount of Memory for Geometric Hashing by Combining Local Invariants , 2007, BMVC.

[22] Adrien Bartoli,et al. Fast Explicit Diffusion for Accelerated Features in Nonlinear Scale Spaces , 2013, BMVC.

[23] Wen Gao,et al. Effective and efficient object-based image retrieval using visual phrases , 2006, MM '06.

[24] Mickaël Coustaty,et al. New spatial-organization-based scale and rotation invariant features for heterogeneous-content camera-based document image retrieval , 2018, Pattern Recognit. Lett..

[25] A. Lawrence Spitz,et al. Determination of the Script and Language Content of Document Images , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[26] Andrew W. Fitzgibbon,et al. A Buyer's Guide to Conic Fitting , 1995, BMVC.

[27] Masakazu Iwamura,et al. Real-Time Retrieval for Images of Documents in Various Languages Using a Web Camera , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[28] Luc Van Gool,et al. Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[29] Vincent Lepetit,et al. Boosting Binary Keypoint Descriptors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[30] Vincent Lepetit,et al. BRIEF: Binary Robust Independent Elementary Features , 2010, ECCV.

[31] Jiri Matas,et al. Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[32] Kurt Konolige,et al. CenSurE: Center Surround Extremas for Realtime Feature Detection and Matching , 2008, ECCV.

[33] Roland Siegwart,et al. BRISK: Binary Robust invariant scalable keypoints , 2011, 2011 International Conference on Computer Vision.

[34] Shijian Lu,et al. Script and Language Identification in Noisy and Degraded Document Images , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35] Chris Murphy,et al. Local Label Descriptor for Example Based Semantic Image Labeling , 2012, ECCV.

[36] Shijian Lu,et al. Document Image Retrieval through Word Shape Coding , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37] S. Lu,et al. Keyword Spotting and Retrieval of Document Images Captured by a Digital Camera , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[38] Hans P. Morevec. Towards automatic visual obstacle avoidance , 1977, IJCAI 1977.

[39] Jean-Michel Morel,et al. From Gestalt Theory to Image Analysis: A Probabilistic Approach , 2007 .

[40] Masakazu Iwamura,et al. Real-Time Document Image Retrieval for a 10 Million Pages Database with a Memory Efficient and Stability Improved LLAH , 2011, 2011 International Conference on Document Analysis and Recognition.

[41] Masakazu Iwamura,et al. Camera Based Document Image Retrieval with More Time and Memory Efficient LLAH , 2008 .

[42] Shijian Lu,et al. Retrieval of machine-printed Latin documents through Word Shape Coding , 2008, Pattern Recognit..

[43] Christoph H. Lampert,et al. Learning to Localize Objects with Structured Output Regression , 2008, ECCV.

[44] Sebastian Nowozin,et al. Structured Learning and Prediction in Computer Vision , 2011, Found. Trends Comput. Graph. Vis..

[45] Paul L. Rosin. Measuring Corner Properties , 1999, Comput. Vis. Image Underst..

[46] David S. Doermann,et al. Camera-based analysis of text and documents: a survey , 2005, International Journal of Document Analysis and Recognition (IJDAR).

[47] Christopher G. Harris,et al. A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[48] Cordelia Schmid,et al. Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[49] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2015, CVPR.

[50] Bin Fan,et al. Local Image Descriptor: Modern Approaches , 2015, SpringerBriefs in Computer Science.

[51] Azeddine Beghdadi,et al. A survey of perceptual image processing methods , 2013, Signal Process. Image Commun..

[52] Alan F. Smeaton,et al. Using character shape coding for information retrieval , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[53] Masakazu Iwamura,et al. Real-Time Document Image Retrieval on a Smartphone , 2012, 2012 10th IAPR International Workshop on Document Analysis Systems.

[54] T. Nakai,et al. Hashing with Local Combinations of Feature Points and Its Application to Camera-Based Document Image Retrieval — Retrieval in 0 . 14 Second from 10 , 000 Pages — , 2005 .

[55] Peter Kontschieder,et al. Structured class-labels in random forests for semantic image labelling , 2011, 2011 International Conference on Computer Vision.

[56] Berna Erol,et al. Paper-Based Augmented Reality , 2007 .

[57] S. M. Steve. SUSAN - a new approach to low level image processing , 1997 .

[58] Gary R. Bradski,et al. ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[59] Adrien Bartoli,et al. KAZE Features , 2012, ECCV.

[60] Pierre Vandergheynst,et al. FREAK: Fast Retina Keypoint , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.