Limitless HTTP in an HTTPS World: Inferring the Semantics of the HTTPS Protocol without Decryption

We present new analytic techniques for inferring HTTP semantics from passive observations of HTTPS that can infer the value of important fields including the status-code, Content-Type, and Server, and the presence or absence of several additional HTTP header fields, e.g., Cookie and Referer. Our goals are to improve the understanding of the confidentiality limitations of HTTPS, and to explore benign uses of traffic analysis that could replace HTTPS interception and static private keys in some scenarios. We found that our techniques increase the efficacy of malware detection, but they do not enable more powerful website fingerprinting attacks against Tor. Our broader set of results raises concerns about the confidentiality goals of TLS relative to a user's expectation of privacy, warranting future research. We apply our methods to the semantics of both HTTP/1.1 and HTTP/2 on data collected from automated runs of Firefox 58.0, Chrome 63.0, and Tor Browser 7.0.11 in a lab setting, and from applications running in a malware sandbox. We obtain ground truth plaintext for a diverse set of applications from the malware sandbox by extracting the key material needed for decryption from RAM post-execution. We developed an iterative approach to simultaneously solve several multi-class (field values) and binary (field presence) classification problems, and we show that our inference algorithm achieves an unweighted $F_1$ score greater than 0.900 for most HTTP fields examined.

[1]  Xiapu Luo,et al.  HTTPOS: Sealing Information Leaks with Browser-side Obfuscation of Encrypted Flows , 2011, NDSS.

[2]  Wouter Joosen,et al.  Request and Conquer: Exposing Cross-Origin Resource Size , 2016, USENIX Security Symposium.

[3]  Christopher Krügel,et al.  BotFinder: finding bots in network traffic without deep packet inspection , 2012, CoNEXT '12.

[4]  Tao Wang,et al.  On Realistically Attacking Tor with Website Fingerprinting , 2016, Proc. Priv. Enhancing Technol..

[5]  Charles V. Wright,et al.  Traffic Morphing: An Efficient Defense Against Statistical Traffic Analysis , 2009, NDSS.

[6]  Eric Rescorla,et al.  The Transport Layer Security (TLS) Protocol Version 1.2 , 2008, RFC.

[7]  Subharthi Paul,et al.  Deciphering malware’s use of TLS (without decryption) , 2016, Journal of Computer Virology and Hacking Techniques.

[8]  Roy T. Fielding,et al.  Hypertext Transfer Protocol (HTTP/1.1): Semantics and Content , 2014, RFC.

[9]  Nick Sullivan,et al.  The Security Impact of HTTPS Interception , 2017, NDSS.

[10]  Roy T. Fielding,et al.  Hypertext Transfer Protocol (HTTP/1.1): Message Syntax and Routing , 2014, RFC.

[11]  Aaron Walters,et al.  The Art of Memory Forensics: Detecting Malware and Threats in Windows, Linux, and Mac Memory , 2014 .

[12]  Brian Neil Levine,et al.  Inferring the source of encrypted HTTP connections , 2006, CCS '06.

[13]  Mark Nottingham,et al.  The ORIGIN HTTP/2 Frame , 2018, RFC.

[14]  Silvio Micali,et al.  Probabilistic Encryption , 1984, J. Comput. Syst. Sci..

[15]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[16]  Martin Thomson,et al.  Hypertext Transfer Protocol Version 2 (HTTP/2) , 2015, RFC.

[17]  Bruce Schneier,et al.  Analysis of the SSL 3.0 protocol , 1996 .

[18]  Eric Rescorla,et al.  The Transport Layer Security (TLS) Protocol Version 1.3 , 2018, RFC.

[19]  Guofei Gu,et al.  BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure-Independent Botnet Detection , 2008, USENIX Security Symposium.

[20]  Roberto Peon,et al.  HPACK: Header Compression for HTTP/2 , 2015, RFC.

[21]  Andrew Reed,et al.  Identifying HTTPS-Protected Netflix Videos in Real-Time , 2017, CODASPY.

[22]  Klaus Wehrle,et al.  Website Fingerprinting at Internet Scale , 2016, NDSS.

[23]  Blake Anderson,et al.  Machine Learning for Encrypted Malware Traffic Classification: Accounting for Noisy Labels and Non-Stationarity , 2017, KDD.

[24]  David Benjamin Applying GREASE to TLS Extensibility , 2019 .

[25]  Shuai Li,et al.  Fingerprinting Keywords in Search Queries over Tor , 2017, Proc. Priv. Enhancing Technol..

[26]  Ariel J. Feldman,et al.  Lest we remember: cold-boot attacks on encryption keys , 2008, CACM.

[27]  Silvio Micali,et al.  Probabilistic encryption & how to play mental poker keeping secret all partial information , 1982, STOC '82.

[28]  Martin Thomson,et al.  Secondary Certificate Authentication in HTTP/2 , 2017 .

[29]  Tao Wang,et al.  Effective Attacks and Provable Defenses for Website Fingerprinting , 2014, USENIX Security Symposium.

[30]  Blake Anderson,et al.  Identifying Encrypted Malware Traffic with Contextual Flow Data , 2016, AISec@CCS.

[31]  Ralph Droms,et al.  Data Center use of Static Diffie-Hellman in TLS 1.3 , 2017 .

[32]  Thomas Engel,et al.  Website fingerprinting in onion routing based anonymization networks , 2011, WPES.

[33]  Vitaly Shmatikov,et al.  Beauty and the Burst: Remote Identification of Encrypted Video Streams , 2017, USENIX Security Symposium.

[34]  Aditya Akella,et al.  Seeing through Network-Protocol Obfuscation , 2015, CCS.