Demographic Information Inference through Meta-Data Analysis of Wi-Fi Traffic

Privacy inference through meta-data (e.g., IP, Host) analysis of Wi-Fi traffic poses a potentially more serious threat to user privacy. First, it provides a more efficient and scalable approach to infer users’ sensitive information without checking the content of Wi-Fi traffic. Second, meta-data based demographics inference can work on both unencrypted and encrypted traffic (e.g., HTTPS traffic). In this study, we present a novel approach to infer user demographic information by exploiting the meta-data of Wi-Fi traffic. We develop an inference framework based on machine learning and evaluate its performance on a real-world dataset, which includes the Wi-Fi access of 28,158 users in five months. The framework extracts four kinds of features from real-world Wi-Fi traffic and applies a novel machine learning technique (XGBoost) to predict user demographics. Our analytical results show that, the overall accuracy of inferring gender and education level of users can be 82 and 78 percent, respectively. It is surprising to show that, even for HTTPS traffic, user demographics can still be predicted at accuracy of 69 and 76 percent, respectively, which well demonstrates the practicality of the proposed privacy inference scheme. Finally, we discuss and evaluate potential mitigation methods for such inference attacks.

[1]  Liu Junyi,et al.  Who Moved My Cheese: Towards Automatic and Fine-Grained Classification and Modeling Ad Network , 2016 .

[2]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[3]  Parth H. Pathak,et al.  Characterization of wireless multi-device users , 2015, 2015 12th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON).

[4]  Samir I. Shaheen,et al.  A New Security Mechanism to Perform Traffic Anonymity with Dummy Traffic Synthesis , 2009, 2009 International Conference on Computational Science and Engineering.

[5]  Shuai Li,et al.  Demographics inference through Wi-Fi network traffic analysis , 2016, IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications.

[6]  Xiaohui Liang,et al.  Privacy Leakage of Location Sharing in Mobile Social Networks: Attacks and Defense , 2016, IEEE Transactions on Dependable and Secure Computing.

[7]  Aleksandar Kuzmanovic,et al.  Mosaic: quantifying privacy leakage in mobile networks , 2013, SIGCOMM.

[8]  Michael Weber,et al.  Device Names in the Wild: Investigating Privacy Risks of Zero Configuration Networking , 2013, 2013 IEEE 14th International Conference on Mobile Data Management.

[9]  Xuemin Shen,et al.  An Efficient Privacy-Preserving Scheme against Traffic Analysis Attacks in Network Coding , 2009, IEEE INFOCOM 2009.

[10]  Le Yu,et al.  POSTER: LocMask: A Location Privacy Protection Framework in Android System , 2014, CCS.

[11]  Frank Piessens,et al.  Why MAC Address Randomization is not Enough: An Analysis of Wi-Fi Network Discovery Mechanisms , 2016, AsiaCCS.

[12]  Haojin Zhu,et al.  All your location are belong to us: breaking mobile social networks for automated user location tracking , 2013, MobiHoc '14.

[13]  Ron Kohavi,et al.  Feature Subset Selection Using the Wrapper Method: Overfitting and Dynamic Search Space Topology , 1995, KDD.

[14]  Hayder Radha,et al.  Who are you talking to? Breaching privacy in encrypted IM networks , 2013, 2013 21st IEEE International Conference on Network Protocols (ICNP).

[15]  Nick Mathewson,et al.  Practical Traffic Analysis: Extending and Resisting Statistical Disclosure , 2004, Privacy Enhancing Technologies.

[16]  Mark Handley,et al.  The final nail in WEP's coffin , 2006, 2006 IEEE Symposium on Security and Privacy (S&P'06).

[17]  Alessandro Epasto,et al.  Signals from the crowd: uncovering social relationships through smartphone probes , 2013, Internet Measurement Conference.

[18]  Patrick D. McDaniel,et al.  Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples , 2016, ArXiv.

[19]  Parth H. Pathak,et al.  Uncovering Privacy Leakage in BLE Network Traffic of Wearable Fitness Trackers , 2016, HotMobile.

[20]  Fusheng Wang,et al.  A Comparative Study of Demographic Attribute Inference in Twitter , 2015, ICWSM.

[21]  David Wolinsky,et al.  Hang with your buddies to resist intersection attacks , 2013, CCS.

[22]  Mohamed Ali Kâafar,et al.  You are what you like! Information leakage through users' Interests , 2012, NDSS.

[23]  Parth H. Pathak,et al.  Characterization of Wireless Multidevice Users , 2016, TOIT.

[24]  Margaret L. Kern,et al.  Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach , 2013, PloS one.

[25]  Riccardo Bettati,et al.  Active traffic analysis attacks and countermeasures , 2003, 2003 International Conference on Computer Networks and Mobile Computing, 2003. ICCNMC 2003..

[26]  Yong Liao,et al.  SAMPLES: Self Adaptive Mining of Persistent LExical Snippets for Classifying Mobile Application Traffic , 2015, MobiCom.

[27]  Hua Li,et al.  Demographic prediction based on user's browsing behavior , 2007, WWW '07.

[28]  Rich Caruana,et al.  An empirical comparison of supervised learning algorithms , 2006, ICML.

[29]  Nino Vincenzo Verde,et al.  No NAT'd User Left Behind: Fingerprinting Users behind NAT from NetFlow Records Alone , 2014, 2014 IEEE 34th International Conference on Distributed Computing Systems.

[30]  Felipe Cucker,et al.  Best Choices for Regularization Parameters in Learning Theory: On the Bias—Variance Problem , 2002, Found. Comput. Math..

[31]  Ivan Martinovic,et al.  Who do you sync you are?: smartphone fingerprinting via application behaviour , 2013, WiSec '13.

[32]  Xu Ji,et al.  Location Privacy against Traffic Analysis Attacks in Wireless Sensor Networks , 2010, 2010 International Conference on Information Science and Applications.

[33]  Parth H. Pathak,et al.  Contextual localization through network traffic analysis , 2014, IEEE INFOCOM 2014 - IEEE Conference on Computer Communications.

[34]  Andrzej Duda,et al.  Markov chain fingerprinting to classify encrypted traffic , 2014, IEEE INFOCOM 2014 - IEEE Conference on Computer Communications.

[35]  Karl Koscher,et al.  vpwns: Virtual Pwned Networks , 2012, FOCI.

[36]  Mauro Conti,et al.  AppScanner: Automatic Fingerprinting of Smartphone Apps from Encrypted Network Traffic , 2016, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[37]  Nino Vincenzo Verde,et al.  No Place to Hide that Bytes Won't Reveal: Sniffing Location-Based Encrypted Traffic to Track a User's Position , 2015, NSS.

[38]  Xiaohui Liang,et al.  When CSI Meets Public WiFi: Inferring Your Mobile Phone Password via WiFi Signals , 2016, CCS.

[39]  Nikita Borisov,et al.  Tracking Mobile Web Users Through Motion Sensors: Attacks and Defenses , 2016, NDSS.

[40]  Blaine Nelson,et al.  Support Vector Machines Under Adversarial Label Noise , 2011, ACML.

[41]  Fei Wang,et al.  Kaleido: Network Traffic Attribution using Multifaceted Footprinting , 2014, SDM.

[42]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[43]  Wei Cheng,et al.  Characterizing privacy leakage of public WiFi networks for users on travel , 2013, 2013 Proceedings IEEE INFOCOM.

[44]  Laurent Heutte,et al.  Influence of Hyperparameters on Random Forest Accuracy , 2009, MCS.

[45]  A. B. M. Musa,et al.  Tracking unmodified smartphones using wi-fi monitors , 2012, SenSys '12.

[46]  Milad Shokouhi,et al.  Inferring the demographics of search users: social data meets search queries , 2013, WWW.

[47]  Nitesh V. Chawla,et al.  Inferring user demographics and social strategies in mobile social networks , 2014, KDD.

[48]  Nino Vincenzo Verde,et al.  Analyzing Android Encrypted Network Traffic to Identify User Actions , 2016, IEEE Transactions on Information Forensics and Security.

[49]  Wenke Lee,et al.  The Price of Free: Privacy Leakage in Personalized Mobile In-Apps Ads , 2016, NDSS.

[50]  Prasant Mohapatra,et al.  Your Installed Apps Reveal Your Gender and More! , 2015, MOCO.

[51]  Nino Vincenzo Verde,et al.  Can't You Hear Me Knocking: Identification of User Actions on Android Apps via Traffic Analysis , 2014, CODASPY.

[52]  Sencun Zhu,et al.  Towards event source unobservability with minimum network traffic in sensor networks , 2008, WiSec '08.

[53]  Oliver Berthold,et al.  Dummy Traffic against Long Term Intersection Attacks , 2002, Privacy Enhancing Technologies.

[54]  Ana Ferreira,et al.  Socio-technical Security Analysis of Wireless Hotspots , 2014, HCI.

[55]  Erik Tews,et al.  Practical attacks against WEP and WPA , 2009, WiSec '09.

[56]  Carmela Troncoso,et al.  Do Dummies Pay Off? Limits of Dummy Traffic Protection in Anonymous Communications , 2014, Privacy Enhancing Technologies.

[57]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[58]  Dawn Xiaodong Song,et al.  NetworkProfiler: Towards automatic fingerprinting of Android apps , 2013, 2013 Proceedings IEEE INFOCOM.

[59]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[60]  Michael Hicks,et al.  Deanonymizing mobility traces: using social network as a side-channel , 2012, CCS.