Mobile Video Capture of Multi-page Documents

This paper presents a mobile application for capturing images of printed multi-page documents with a smartphone camera. With today's available document capture applications, the user has to carefully capture individual photographs of each page and assemble them into a document, leading to a cumbersome and time consuming user experience. We propose a novel approach of using video to capture multipage documents. Our algorithm automatically selects the best still images corresponding to individual pages of the document from the video. The technique combines video motion analysis, inertial sensor signals, and an image quality (IQ) prediction technique to select the best page images from the video. For the latter, we extend a previous no-reference IQ prediction algorithm to suit the needs of our video application. The algorithm has been implemented on an iPhone 4S. Individual pages are successfully extracted for a wide variety of multi-page documents. OCR analysis shows that the quality of document images produced by our app is comparable to that of standard still captures. At the same time, user studies confirm that in the majority of trials, video capture provides an experience that is faster and more convenient than multiple still captures.

[1]  Koji Ishikawa,et al.  A method of analyzing the handling of paper documents in motion images , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[2]  R. Venkatesh Babu,et al.  No-reference image quality assessment using modified extreme learning machine classifier , 2009, Appl. Soft Comput..

[3]  R. SheikhH.,et al.  No-reference quality assessment using natural scene statistics , 2005 .

[4]  Xujun Peng,et al.  Automated image quality assessment for camera-captured OCR , 2011, 2011 18th IEEE International Conference on Image Processing.

[5]  David S. Doermann,et al.  Real-Time No-Reference Image Quality Assessment Based on Filter Learning , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  David S. Doermann,et al.  Unsupervised feature learning framework for no-reference image quality assessment , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Enrique Argones-Rúa,et al.  Quality-Based Score Normalization and Frame Selection for Video-Based Person Authentication , 2008, BIOID.

[8]  Alan C. Bovik,et al.  No-reference quality assessment using natural scene statistics: JPEG2000 , 2005, IEEE Transactions on Image Processing.

[9]  Henry S. Baird,et al.  Document image quality: making fine discriminations , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[10]  David S. Doermann,et al.  Learning features for predicting OCR accuracy , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[11]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[12]  David S. Doermann,et al.  Sharpness estimation for document and scene images , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[13]  Sian Lun Lau,et al.  Movement recognition using the accelerometer in smartphones , 2010, 2010 Future Network & Mobile Summit.

[14]  N. Nikolaidis,et al.  Video shot detection and condensed representation. a review , 2006, IEEE Signal Processing Magazine.

[15]  Diane J. Cook,et al.  Simple and Complex Activity Recognition through Smart Phones , 2012, 2012 Eighth International Conference on Intelligent Environments.