Smartphone Identification via Passive Traffic Fingerprinting: A Sequence-to-Sequence Learning Approach

Passive cyber-security attacks do not require any modification of the data stream generated by the victim, nor the creation of a false statement; in particular, those attacks based on statistical analysis aim at acquiring sensible information by just analyzing traffic patterns. Our work sits on the conjecture that the PDCCH, which is transmitted in clear text, may be effectively used to statistically characterize the traffic generated by a smartphone in standby mode. Through this statistical signature, the attacker may then infer whether an unknown traffic pattern is generated by the victim user’s terminal, guessing if the victim is in a certain geographical area, and in turn gaining the ability to track the victim’s movements and/or to profile their habits. In this work, we propose a data collection and processing framework that successfully obtains such signatures. User data patterns (transport block sizes and communications direction) are retrieved by analyzing the mobile network scheduling. Hence, a sequence-to-sequence learning framework to extract smartphone signatures from passive traffic is put forward, and is experimentally validated using a dataset of 40 user traces, successfully identifying up to 90 percent of the users.

[1]  Cristina Cano,et al.  srsLTE: an open-source platform for LTE evolution and experimentation , 2016, WiNTECH@MobiCom.

[2]  Valtteri Niemi,et al.  Practical Attacks Against Privacy and Availability in 4G/LTE Mobile Communication Systems , 2015, NDSS.

[3]  Jörg Widmer,et al.  OWL: a reliable online watcher for LTE control channel measurements , 2016, ATC@MobiCom.

[4]  Ivan Martinovic,et al.  Who do you sync you are?: smartphone fingerprinting via application behaviour , 2013, WiSec '13.

[5]  Andrew W. Moore,et al.  Bayesian Neural Networks for Internet Traffic Classification , 2007, IEEE Transactions on Neural Networks.

[6]  Konstantina Papagiannaki,et al.  Toward the Accurate Identification of Network Applications , 2005, PAM.

[7]  Paul J. Werbos,et al.  Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.

[8]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[9]  Thorsten Holz,et al.  Breaking LTE on Layer Two , 2019, 2019 IEEE Symposium on Security and Privacy (SP).

[10]  Neil A. Thacker,et al.  The Bhattacharyya metric as an absolute similarity measure for frequency coded data , 1998, Kybernetika.

[11]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[12]  Björn W. Schuller,et al.  Unsupervised Learning of Representations from Audio with Deep Recurrent Neural Networks , 2018 .

[13]  Judith Kelner,et al.  A Survey on Internet Traffic Identification , 2009, IEEE Communications Surveys & Tutorials.