Text Extraction from Smartphone Screenshots to Archive in situ Media Behavior

Life experiences are increasingly intertwined with digital devices, suggesting screens as a preferred, if not required, data source for behavioral studies and health interventions. Text Information Extraction from digital screenshots is then a key prerequisite to the overall accuracy of analyses regarding media behaviors. This unique image data set offers the opportunity i) to test existing Image Processing and Text Recognition methods, and ii) to identify and discuss the computational challenges specific to the considered case. Our aim is to assess whether and how state-of-the-art methodologies can be applied to this novel data set. We show how combining OpenCV-based pre-processing with a Long short-term memory (LSTM) based release of Tesseract OCR, without ad hoc training, ensured a 74% text accuracy at the character level. The implications and incidence of different error factors on the resulting quality of text are discussed, prompting the discussion of future research trajectories.

[1]  Rob Miller,et al.  Sikuli: using GUI screenshots for search and automation , 2009, UIST '09.

[2]  Kamrul Hasan Talukder,et al.  Connected component based approach for text extraction from color image , 2014, 2014 17th International Conference on Computer and Information Technology (ICCIT).

[3]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[4]  Tao Wang,et al.  End-to-end text recognition with convolutional neural networks , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[5]  David S. Doermann,et al.  Text Detection and Recognition in Imagery: A Survey , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Zihan Zhou,et al.  Learning to Read Irregular Text with Attention Mechanisms , 2017, IJCAI.

[7]  Rafael C. Carrasco An open-source OCR evaluation tool , 2014, DATeCH '14.

[8]  James J. Cummings,et al.  Multitasking on a Single Device: Arousal and the Frequency, Anticipation, and Prediction of Switching Between Media Content on a Computer , 2014 .

[9]  Michael J. Roche,et al.  Examining the Interplay of Processes Across Multiple Time-Scales: Illustration With the Intraindividual Study of Affect, Health, and Interpersonal Behavior (iSAHIB) , 2014, Research in human development.

[10]  R. Smith,et al.  An Overview of the Tesseract OCR Engine , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[11]  Gunther Heidemann,et al.  Semi-automatic ground truth annotation in videos: An interactive tool for polygon-based object annotation and segmentation , 2015, K-CAP.