Revisiting the Threat Space for Vision-based Keystroke Inference Attacks

A vision-based keystroke inference attack is a side-channel attack in which an attacker uses an optical device to record users on their mobile devices and infer their keystrokes. The threat space for these attacks has been studied in the past, but we argue that the defining characteristics for this threat space, namely the strength of the attacker, are outdated. Previous works do not study adversaries with vision systems that have been trained with deep neural networks because these models require large amounts of training data and curating such a dataset is expensive. To address this, we create a large-scale synthetic dataset to simulate the attack scenario for a keystroke inference attack. We show that first pre-training on synthetic data, followed by adopting transfer learning techniques on real-life data, increases the performance of our deep learning models. This indicates that these models are able to learn rich, meaningful representations from our synthetic data and that training on the synthetic data can help overcome the issue of having small, real-life datasets for vision-based key stroke inference attacks. For this work, we focus on single keypress classification where the input is a frame of a keypress and the output is a predicted key. We are able to get an accuracy of 95.6% after pre-training a CNN on our synthetic data and training on a small set of real-life data in an adversarial domain adaptation framework. Source Code for Simulator: this https URL

[1]  Michael Backes,et al.  2008 IEEE Symposium on Security and Privacy Compromising Reflections –or– How to Read LCD Monitors Around the Corner , 2022 .

[2]  Tomas Pfister,et al.  Learning from Simulated and Unsupervised Images through Adversarial Training , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Markus G. Kuhn,et al.  Compromising Emanations , 2002, Encyclopedia of Cryptography and Security.

[4]  Trevor Darrell,et al.  Best Practices for Fine-Tuning Visual Classifiers to New Domains , 2016, ECCV Workshops.

[5]  Vladlen Koltun,et al.  Playing for Data: Ground Truth from Computer Games , 2016, ECCV.

[6]  Zhen Ling,et al.  Blind Recognition of Touched Keys on Mobile Devices , 2014, CCS.

[7]  Luc Van Gool,et al.  Learning Semantic Segmentation From Synthetic Data: A Geometrically Guided Input-Output Adaptation Approach , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Antonio M. López,et al.  The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Martin Welk,et al.  Tempest in a Teapot: Compromising Reflections Revisited , 2009, 2009 30th IEEE Symposium on Security and Privacy.

[10]  Peter Robinson,et al.  Learning an appearance-based gaze estimator from one million synthesised images , 2016, ETRA.

[11]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Tao Li,et al.  EyeTell: Video-Assisted Touchscreen Keystroke Inference from Eye Movements , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[13]  Ivan Laptev,et al.  Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Jan-Michael Frahm,et al.  Seeing double: reconstructing obscured typed input from repeated compromising reflections , 2013, CCS.

[15]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[16]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[17]  Trevor Darrell,et al.  Adversarial Discriminative Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Kate Saenko,et al.  Syn2Real: A New Benchmark forSynthetic-to-Real Visual Domain Adaptation , 2018, ArXiv.

[19]  Giovanni Vigna,et al.  ClearShot: Eavesdropping on Keyboard Input from Video , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[20]  Rui Zhang,et al.  VISIBLE: Video-Assisted Keystroke Inference from Tablet Backside Motion , 2016, NDSS.

[21]  Jan-Michael Frahm,et al.  iSpy: automatic reconstruction of typed input from compromising reflections , 2011, CCS '11.

[22]  Xiaojiang Chen,et al.  Cracking Android Pattern Lock in Five Attempts , 2017, NDSS.

[23]  Dumitru Erhan,et al.  Unsupervised Pixel-Level Domain Adaptation with Generative Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Taesung Park,et al.  CyCADA: Cycle-Consistent Adversarial Domain Adaptation , 2017, ICML.

[25]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[26]  Rajesh Kumar,et al.  Beware, Your Hands Reveal Your Secrets! , 2014, CCS.