Text extraction and retrieval from smartphone screenshots: building a repository for life in media

Daily engagement in life experiences is increasingly interwoven with mobile device use. Screen capture at the scale of seconds is being used in behavioral studies and to implement "just-in-time" health interventions. The increasing psychological breadth of digital information will continue to make the actual screens that people view a preferred if not required source of data about life experiences. Effective and efficient Information Extraction and Retrieval from digital screenshots is a crucial prerequisite to successful use of screen data. In this paper, we present the experimental workflow we exploited to: (i) pre-process a unique collection of screen captures, (ii) extract unstructured text embedded in the images, (iii) organize image text and metadata based on a structured schema, (iv) index the resulting document collection, and (v) allow for Image Retrieval through a dedicated vertical search engine application. The adopted procedure integrates different open source libraries for traditional image processing, Optical Character Recognition (OCR), and Image Retrieval. Our aim is to assess whether and how state-of-the-art methodologies can be applied to this novel data set. We show how combining OpenCV-based pre-processing modules with a Long short-term memory (LSTM) based release of Tesseract OCR, without ad hoc training, led to a 74% character-level accuracy of the extracted text. Further, we used the processed repository as baseline for a dedicated Image Retrieval system, for the immediate use and application for behavioral and prevention scientists. We discuss issues of Text Information Extraction and Retrieval that are particular to the screenshot image case and suggest important future work.

[1]  Tao Wang,et al.  End-to-end text recognition with convolutional neural networks , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[2]  Gunther Heidemann,et al.  Semi-automatic ground truth annotation in videos: An interactive tool for polygon-based object annotation and segmentation , 2015, K-CAP.

[3]  Wei Wang,et al.  Similar MRI object retrieval based on modified contour to centroid triangulation with arc difference rate , 2014, SAC.

[4]  Arnaldo de Albuquerque Araújo,et al.  Video similarity search by using compact representations , 2016, SAC.

[5]  Xing Xie,et al.  Search by Screenshots for Universal Article Clipping in Mobile Apps , 2017, ACM Trans. Inf. Syst..

[6]  C. Lee Giles,et al.  Screenomics: A Framework to Capture and Analyze Personal Life Experiences and the Ways that Technology Shapes Them , 2019, Hum. Comput. Interact..

[7]  Wenyi Huang,et al.  Detecting Arbitrary Oriented Text in the Wild with a Visual Attention Model , 2016, ACM Multimedia.

[8]  Anil K. Jain,et al.  Goal-Directed Evaluation of Binarization Methods , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  R. Smith,et al.  An Overview of the Tesseract OCR Engine , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[10]  James J. Cummings,et al.  Multitasking on a Single Device: Arousal and the Frequency, Anticipation, and Prediction of Switching Between Media Content on a Computer , 2014 .

[11]  Michael J. Roche,et al.  Examining the Interplay of Processes Across Multiple Time-Scales: Illustration With the Intraindividual Study of Affect, Health, and Interpersonal Behavior (iSAHIB) , 2014, Research in human development.

[12]  Rob Miller,et al.  Sikuli: using GUI screenshots for search and automation , 2009, UIST '09.

[13]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[14]  Kamrul Hasan Talukder,et al.  Connected component based approach for text extraction from color image , 2014, 2014 17th International Conference on Computer and Information Technology (ICCIT).

[15]  Yi Lu,et al.  Machine printed character segmentation --; An overview , 1995, Pattern Recognit..

[16]  Thomas M. Breuel,et al.  The OCRopus open source OCR system , 2008, Electronic Imaging.

[17]  Xiao Yang,et al.  Text Extraction from Smartphone Screenshots to Archive in situ Media Behavior , 2017, K-CAP.

[18]  Kai Wang,et al.  Word Spotting in the Wild , 2010, ECCV.

[19]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[20]  Stephen E. Robertson,et al.  GatfordCentre for Interactive Systems ResearchDepartment of Information , 1996 .

[21]  David S. Doermann,et al.  Text Detection and Recognition in Imagery: A Survey , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  C. V. Jawahar,et al.  Matching word images for content-based retrieval from printed document images , 2008, International Journal of Document Analysis and Recognition (IJDAR).

[23]  Michael I. Miller,et al.  Content-based image retrieval for brain MRI: An image-searching engine and population-based analysis to utilize past clinical data for future diagnosis , 2015, NeuroImage: Clinical.

[24]  Rafael C. Carrasco An open-source OCR evaluation tool , 2014, DATeCH '14.