No NAT'd User Left Behind: Fingerprinting Users behind NAT from NetFlow Records Alone

It is generally recognized that the network traffic generated by an individual acts as his biometric signature. Several tools exploit this fact to fingerprint and monitor users. Often, though, these tools access the entire traffic, including IP addresses and payloads. In general, this is not feasible on the grounds that both performance and privacy would be negatively affected. In reality, most ISPs convert user traffic into Net Flow records for a concise representation that does not include the payload. More importantly, a single IP address belonging to a large and distributed network is usually masked using Network Address Translation techniques, thus a few IP addresses may be associated to thousands of individuals (NAT'd IPs). We devised a new fingerprinting framework that overcomes these hurdles. Our system is able to analyze a huge amount of network traffic represented as Net Flows, with the intent to track people. It does so by accurately inferring when users are connected to the network and which IP addresses they are using, even though thousands of users are hidden behind NAT. Our prototype implementation was deployed and tested within an existing large metropolitan WiFi network serving about 200,000 users, with an average load of more than 1,000 users simultaneously connected behind 2 NAT'd IP addresses only. Our solution turned out to be very effective, with an accuracy greater than 90%. We also devised new tools and refined existing ones that may be applied to other contexts related to Net Flow analysis.

[1]  T. Kohno,et al.  Remote physical device fingerprinting , 2005, 2005 IEEE Symposium on Security and Privacy (S&P'05).

[2]  Michalis Faloutsos,et al.  Profiling the End Host , 2007, PAM.

[3]  Jürgen Schönwälder,et al.  Cybermetrics: User Identification through Network Flow Analysis , 2010, AIMS.

[4]  Luigi V. Mancini,et al.  Obfuscation of Sensitive Data for Incremental Release of Network Flows , 2015, IEEE/ACM Transactions on Networking.

[5]  Brian Neil Levine,et al.  Inferring the source of encrypted HTTP connections , 2006, CCS '06.

[6]  John McHugh,et al.  Passive network forensics: behavioural classification of network hosts based on connection patterns , 2008, OPSR.

[7]  Srinivasan Seshan,et al.  802.11 user fingerprinting , 2007, MobiCom '07.

[8]  Florian Haemmerling Unconstrained Endpoint Profiling (Googling the Internet) , 2009 .

[9]  Desmond Loh Chin Choong,et al.  Identifying unique devices through wireless fingerprinting , 2008, WiSec '08.

[10]  Benoit Claise,et al.  Specification of the IP Flow Information Export (IPFIX) Protocol for the Exchange of IP Traffic Flow Information , 2008, RFC.

[11]  Rui Wang,et al.  Side-Channel Leaks in Web Applications: A Reality Today, a Challenge Tomorrow , 2010, 2010 IEEE Symposium on Security and Privacy.

[12]  Michalis Faloutsos,et al.  BLINC: multilevel traffic classification in the dark , 2005, SIGCOMM '05.

[13]  Dawn Xiaodong Song,et al.  NetworkProfiler: Towards automatic fingerprinting of Android apps , 2013, 2013 Proceedings IEEE INFOCOM.

[14]  Paul Barford,et al.  Characteristics of network traffic flow anomalies , 2001, IMW '01.

[15]  Wen Zhang,et al.  How much can behavioral targeting help online advertising? , 2009, WWW '09.

[16]  Elisa Bertino,et al.  Using Anonymized Data for Classification , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[17]  Lili Qiu,et al.  Statistical identification of encrypted Web browsing traffic , 2002, Proceedings 2002 IEEE Symposium on Security and Privacy.

[18]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[19]  Hannes Federrath,et al.  Analyzing Characteristic Host Access Patterns for Re-identification of Web User Sessions , 2010, NordSec.

[20]  Zhi-Hua Zhou,et al.  Ensemble Methods: Foundations and Algorithms , 2012 .

[21]  Dario Rossi,et al.  Fine-grained traffic classification with netflow data , 2010, IWCMC.

[22]  Ivan Martinovic,et al.  Who do you sync you are?: smartphone fingerprinting via application behaviour , 2013, WiSec '13.

[23]  Hsiao-Wuen Hon,et al.  Large-vocabulary speaker-independent continuous speech recognition using HMM , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[24]  David Plonka,et al.  FlowScan: A Network Traffic Flow Reporting and Visualization Tool , 2000, LISA.

[25]  Charles V. Wright,et al.  Spot Me if You Can: Uncovering Spoken Phrases in Encrypted VoIP Conversations , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[26]  Sebastian Zander,et al.  Timely and Continuous Machine-Learning-Based Classification for Interactive IP Traffic , 2012, IEEE/ACM Transactions on Networking.

[27]  Leslie Daigle,et al.  WHOIS Protocol Specification , 2004, RFC.

[28]  Fan Zhang,et al.  Inferring users' online activities through traffic analysis , 2011, WiSec '11.

[29]  Kuai Xu,et al.  Internet Traffic Behavior Profiling for Network Security Monitoring , 2008, IEEE/ACM Transactions on Networking.

[30]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[31]  Zhi-Li Zhang,et al.  Profiling internet backbone traffic: behavior models and applications , 2005, SIGCOMM '05.

[32]  Jelena Mirkovic,et al.  Profiling and Clustering Internet Hosts , 2006, DMIN.

[33]  Damon McCoy,et al.  Passive Data Link Layer 802.11 Wireless Device Driver Fingerprinting , 2006, USENIX Security Symposium.

[34]  Tom Fawcett,et al.  ROC Graphs: Notes and Practical Considerations for Researchers , 2007 .

[35]  D.M. Mount,et al.  An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[36]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[37]  Thomas Ristenpart,et al.  Peek-a-Boo, I Still See You: Why Efficient Traffic Analysis Countermeasures Fail , 2012, 2012 IEEE Symposium on Security and Privacy.

[38]  James H. Martin,et al.  Speech and Language Processing, 2nd Edition , 2008 .

[39]  Kuan-Ta Chen,et al.  User identification based on game-play activity patterns , 2007, NetGames '07.