Real-time screen reading: reducing domain shift for one-shot learning

Many digital meters such as those used for home health (e.g. blood pressure meters) or meters monitoring industrial equipment do not contain wireless connectivity. Hence, connecting these devices to phone tracking apps or control centres either requires cumbersome manual transcription or is not plausible due to costs. Our motivation is to cheaply retro-fit these types of meters with ‘smart’ data transfer capabilities using a mobile phone app and limited training data. We demonstrate how one can use single training images of meter screens to build efficient custom meter readers targeted to chosen devices. To this end, we build a CNN based system which runs in real-time on mobile device with very high read accuracy (close to 100%). Our contributions include (i) introduction of an exciting new application domain, (ii) a method of training from purely synthetic data by reducing domain shift using a surprisingly simple approach which unlike adversarial training based methods does not even require unlabelled data; (iii) a highly accurate system for parsing digital meter screens and (iv) release of a new screen reading dataset. The system, although trained solely on synthetic data, transfers very well to the real-world. Our method of screen detection and text recognition also improves over the state of the art on our dataset.

[1]  Ming-Yu Liu,et al.  Coupled Generative Adversarial Networks , 2016, NIPS.

[2]  Tomas Pfister,et al.  Learning from Simulated and Unsupervised Images through Adversarial Training , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[4]  Gregory R. Koch,et al.  Siamese Neural Networks for One-Shot Image Recognition , 2015 .

[5]  Andrew Zisserman,et al.  Reading Text in the Wild with Convolutional Neural Networks , 2014, International Journal of Computer Vision.

[6]  Jana Kosecka,et al.  Synthesizing Training Data for Object Detection in Indoor Scenes , 2017, Robotics: Science and Systems.

[7]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Luca Bertinetto,et al.  Fully-Convolutional Siamese Networks for Object Tracking , 2016, ECCV Workshops.

[9]  Adrien Bartoli,et al.  Fast Explicit Diffusion for Accelerated Features in Nonlinear Scale Spaces , 2013, BMVC.

[10]  Andrew Zisserman,et al.  Deep Features for Text Spotting , 2014, ECCV.

[11]  Michael Felsberg,et al.  The Sixth Visual Object Tracking VOT2018 Challenge Results , 2018, ECCV Workshops.

[12]  Cordelia Schmid,et al.  An Affine Invariant Interest Point Detector , 2002, ECCV.

[13]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[14]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[15]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[16]  Christian Theobalt,et al.  GANerated Hands for Real-Time 3D Hand Tracking from Monocular RGB , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  Junjie Yan,et al.  FOTS: Fast Oriented Text Spotting with a Unified Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18]  Jiri Matas,et al.  Locally Optimized RANSAC , 2003, DAGM-Symposium.

[19]  Luc Van Gool,et al.  Semantic Instance Segmentation with a Discriminative Loss Function , 2017, ArXiv.

[20]  Trevor Darrell,et al.  Adversarial Discriminative Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Martial Hebert,et al.  Cut, Paste and Learn: Surprisingly Easy Synthesis for Instance Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).