Feature Selection for Website Fingerprinting

Abstract Website fingerprinting based on TCP/IP headers is of significant relevance to several Internet entities. Prior work has focused only on a limited set of features, and does not help understand the extents of fingerprint-ability. We address this by conducting an exhaustive feature analysis within eight different communication scenarios. Our analysis helps reveal several previously-unknown features in several scenarios, that can be used to fingerprint websites with much higher accuracy than previously demonstrated. This work helps the community better understand the extents of learnability (and vulnerability) from TCP/IP headers.

[1]  Ramin Sadre,et al.  The curious case of parallel connections in HTTP/2 , 2016, 2016 12th International Conference on Network and Service Management (CNSM).

[2]  Jasleen Kaur,et al.  Can web pages be classified using anonymized TCP/IP headers? , 2015, 2015 IEEE Conference on Computer Communications (INFOCOM).

[3]  Douglas W. Jones,et al.  Secure Data Export and Auditing Using Data Diodes , 2006, EVT.

[4]  Girish Venkatachalam The OpenSSH protocol under the hood , 2007 .

[5]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[6]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[7]  Xiang Cai,et al.  CS-BuFLO: A Congestion Sensitive Website Fingerprinting Defense , 2014, WPES.

[8]  Jasleen Kaur,et al.  Can Android Applications Be Identified Using Only TCP/IP Headers of Their Launch Time Traffic? , 2016, WISEC.

[9]  Thomas Engel,et al.  Website fingerprinting in onion routing based anonymization networks , 2011, WPES.

[10]  Rachel Greenstadt,et al.  A Critical Evaluation of Website Fingerprinting Attacks , 2014, CCS.

[11]  Wouter Joosen,et al.  Automated Feature Extraction for Website Fingerprinting through Deep Learning. , 2017 .

[12]  Giovanni Cherubin Bayes, not Naïve: Security Bounds on Website Fingerprinting Defenses , 2017, Proc. Priv. Enhancing Technol..

[13]  Charles V. Wright,et al.  On Inferring Application Protocol Behaviors in Encrypted Network Traffic , 2006, J. Mach. Learn. Res..

[14]  Matthew Roughan,et al.  P2P the gorilla in the cable , 2003 .

[15]  Vitaly Shmatikov,et al.  Timing Analysis in Low-Latency Mix Networks: Attacks and Defenses , 2006, ESORICS.

[16]  Xiapu Luo,et al.  HTTPOS: Sealing Information Leaks with Browser-side Obfuscation of Encrypted Flows , 2011, NDSS.

[17]  Douglas J. Leith,et al.  A Web Traffic Analysis Attack Using Only Timing Information , 2014, IEEE Transactions on Information Forensics and Security.

[18]  George Danezis,et al.  Low-cost traffic analysis of Tor , 2005, 2005 IEEE Symposium on Security and Privacy (S&P'05).

[19]  Mun Choon Chan,et al.  Website Fingerprinting and Identification Using Ordered Feature Sequences , 2010, ESORICS.

[20]  Jun Chen,et al.  Joint analysis of two microarray gene-expression data sets to select lung adenocarcinoma marker genes , 2004, BMC Bioinformatics.

[21]  Tao Wang,et al.  On Realistically Attacking Tor with Website Fingerprinting , 2016, Proc. Priv. Enhancing Technol..

[22]  Jia Wang,et al.  Analyzing peer-to-peer traffic across large networks , 2004, IEEE/ACM Trans. Netw..

[23]  Charles V. Wright,et al.  Traffic Morphing: An Efficient Defense Against Statistical Traffic Analysis , 2009, NDSS.

[24]  Gilles Louppe,et al.  Understanding variable importances in forests of randomized trees , 2013, NIPS.

[25]  Pedro M. Domingos A few useful things to know about machine learning , 2012, Commun. ACM.

[26]  George Danezis,et al.  k-fingerprinting: A Robust Scalable Website Fingerprinting Technique , 2015, USENIX Security Symposium.

[27]  Yong Wang,et al.  ISP-Enabled Behavioral Ad Targeting without Deep Packet Inspection , 2010, 2010 Proceedings IEEE INFOCOM.

[28]  Tao Wang,et al.  Effective Attacks and Provable Defenses for Website Fingerprinting , 2014, USENIX Security Symposium.

[29]  Hannes Federrath,et al.  Website fingerprinting: attacking popular privacy enhancing technologies with the multinomial naïve-bayes classifier , 2009, CCSW '09.

[30]  Brijesh Joshi,et al.  Touching from a distance: website fingerprinting attacks and defenses , 2012, CCS.

[31]  David D. Jensen,et al.  Privacy Vulnerabilities in Encrypted HTTP Streams , 2005, Privacy Enhancing Technologies.

[32]  Thomas Ristenpart,et al.  Peek-a-Boo, I Still See You: Why Efficient Traffic Analysis Countermeasures Fail , 2012, 2012 IEEE Symposium on Security and Privacy.

[33]  K. Lunetta,et al.  Screening large-scale association study data: exploiting interactions using random forests , 2004, BMC Genetics.

[34]  Tao Wang,et al.  Improved website fingerprinting on Tor , 2013, WPES.

[35]  Mike Perry,et al.  Toward an Efficient Website Fingerprinting Defense , 2015, ESORICS.

[36]  Markus Feilner,et al.  OpenVPN: Building and Integrating Virtual Private Networks: Learn how to build secure VPNs using this powerful Open Source application , 2006 .

[37]  Tao Wang,et al.  Walkie-Talkie: An Efficient Defense Against Passive Website Fingerprinting Attacks , 2017, USENIX Security Symposium.

[38]  Nick Mathewson,et al.  Tor: The Second-Generation Onion Router , 2004, USENIX Security Symposium.

[39]  Shigeki Goto,et al.  Fingerprinting Attack on Tor Anonymity using Deep Learning , 2016 .

[40]  Trevor Hastie,et al.  Averaged gene expressions for regression. , 2007, Biostatistics.

[41]  Konstantina Papagiannaki,et al.  Is the Web HTTP/2 Yet? , 2016, PAM.

[42]  Ling Huang,et al.  I Know Why You Went to the Clinic: Risks and Realization of HTTPS Traffic Analysis , 2014, Privacy Enhancing Technologies.

[43]  Marco Mellia,et al.  Uncovering the Big Players of the Web , 2012, TMA.

[44]  Lili Qiu,et al.  Statistical identification of encrypted Web browsing traffic , 2002, Proceedings 2002 IEEE Symposium on Security and Privacy.

[45]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[46]  Gregory Piatetsky-Shapiro,et al.  High-Dimensional Data Analysis: The Curses and Blessings of Dimensionality , 2000 .

[47]  Brian Neil Levine,et al.  Inferring the source of encrypted HTTP connections , 2006, CCS '06.

[48]  Xun Gong,et al.  Fingerprinting websites using remote traffic analysis , 2010, CCS '10.

[49]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[50]  Ramón Díaz-Uriarte,et al.  Gene selection and classification of microarray data using random forest , 2006, BMC Bioinformatics.

[51]  Jiankun Hu,et al.  Attacking Anonymous Web Browsing at Local Area Networks Through Browsing Dynamics , 2012, Comput. J..

[52]  Klaus Wehrle,et al.  Website Fingerprinting at Internet Scale , 2016, NDSS.

[53]  Martino Trevisan,et al.  Towards web service classification using addresses and DNS , 2016, 2016 International Wireless Communications and Mobile Computing Conference (IWCMC).

[54]  Xiang Cai,et al.  Glove: A Bespoke Website Fingerprinting Defense , 2014, WPES.

[55]  Giovanni Cherubin,et al.  Website Fingerprinting Defenses at the Application Layer , 2017, Proc. Priv. Enhancing Technol..

[56]  Fan Zhang,et al.  Inferring users' online activities through traffic analysis , 2011, WiSec '11.

[57]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[58]  Thomas Lengauer,et al.  Classification with correlated features: unreliability of feature ranking and solutions , 2011, Bioinform..

[59]  Thomas Lengauer,et al.  Permutation importance: a corrected feature importance measure , 2010, Bioinform..

[60]  Krishna P. Gummadi,et al.  An analysis of Internet content delivery systems , 2002, OPSR.

[61]  Tao Wang,et al.  A Systematic Approach to Developing and Evaluating Website Fingerprinting Defenses , 2014, CCS.