Data encryption is the primary method of protecting the privacy of consumer device Internet communications from network observers. The ability to automatically detect unencrypted data in network traffic is therefore an essential tool for auditing Internet-connected devices. Existing methods identify network packets containing cleartext but cannot differentiate packets containing encrypted data from packets containing compressed unencrypted data, which can be easily recovered by reversing the compression algorithm. This makes it difficult for consumer protection advocates to identify devices that risk user privacy by sending sensitive data in a compressed unencrypted format. Here, we present the first technique to automatically distinguish encrypted from compressed unencrypted network transmissions on a per-packet basis. We apply three machine learning models and achieve a maximum 66.9% accuracy with a convolutional neural network trained on raw packet data. This result is a baseline for this previously unstudied machine learning problem, which we hope will motivate further attention and accuracy improvements. To facilitate continuing research on this topic, we have made our training and test datasets available to the public.
[1]
Nick Feamster,et al.
Cleartext Data Transmissions in Consumer IoT Medical Devices
,
2017,
IoT S&P@CCS.
[2]
A. Azzouz.
2011
,
2020,
City.
[3]
S. M. García,et al.
2014:
,
2020,
A Party for Lazarus.
[4]
Jimmy Ba,et al.
Adam: A Method for Stochastic Optimization
,
2014,
ICLR.
[5]
Gaël Varoquaux,et al.
Scikit-learn: Machine Learning in Python
,
2011,
J. Mach. Learn. Res..
[6]
Hamed Khiabani,et al.
Extracting Files from Network Packet Captures
,
2019
.
[7]
Nick Feamster,et al.
A Smart Home is No Castle: Privacy Vulnerabilities of Encrypted IoT Traffic
,
2017,
ArXiv.
[8]
Paras Malhotra.
Detection of encrypted streams for egress monitoring
,
2007
.
[9]
Patrice Y. Simard,et al.
Best practices for convolutional neural networks applied to visual document analysis
,
2003,
Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..
[10]
Mark B. Sandler,et al.
Automatic Tagging Using Deep Convolutional Neural Networks
,
2016,
ISMIR.