On the effectiveness of risk prediction based on users browsing behavior

Users are typically the final target of web attacks: criminals are interested in stealing their money, their personal information, or in infecting their machines with malicious code. However, while many aspects of web attacks have been carefully studied by researchers and security companies, the reasons that make certain users more "at risk" than others are still unknown. Why do certain users never encounter malicious pages while others seem to end up on them on a daily basis? To answer this question, in this paper we present a comprehensive study on the effectiveness of risk prediction based only on the web browsing behavior of users. Our analysis is based on a telemetry dataset collected by a major AntiVirus vendor, comprising millions of URLs visited by more than 100,000 users during a period of three months. For each user, we extract detailed usage statistics, and distill this information in 74 unique features that model different aspects of the user's behavior. After the features are extracted, we perform a correlation analysis to see if any of them is correlated with the probability of visiting malicious web pages. Afterwards, we leverage machine learning techniques to provide a prediction model that can be used to estimate the risk class of a given user. The results of our experiments show that it is possible to predict with a reasonable accuracy (up to 87%) the users that are more likely to be the victims of web attacks, only by analyzing their browsing history.

[1]  Гарнаева Мария Александровна,et al.  Kaspersky security Bulletin 2013 , 2014 .

[2]  J. Wolfowitz,et al.  An Introduction to the Theory of Statistics , 1951, Nature.

[3]  Collin Jackson,et al.  Robust defenses for cross-site request forgery , 2008, CCS.

[4]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[5]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[6]  Mark S. Ackerman,et al.  Expertise recommender: a flexible recommendation system and architecture , 2000, CSCW '00.

[7]  Sonia Chiasson,et al.  A clinical study of risk factors related to malware infections , 2013, CCS.

[8]  J. Wolfowitz,et al.  Introduction to the Theory of Statistics. , 1951 .

[9]  Joaquin Delgado,et al.  Knowledge Bases and User Profiling in Travel and Hospitality Recommender Systems , 2002, ENTER.

[10]  Christopher Krügel,et al.  Defending Browsers against Drive-by Downloads: Mitigating Heap-Spraying Code Injection Attacks , 2009, DIMVA.

[11]  Anja Feldmann,et al.  An Assessment of Overt Malicious Activity Manifest in Residential Networks , 2011, DIMVA.

[12]  Claude Castelluccia,et al.  On the uniqueness of Web browsing history patterns , 2014, Ann. des Télécommunications.

[13]  Engin Kirda,et al.  Insights into User Behavior in Dealing with Internet Attacks , 2012, NDSS.

[14]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[15]  E. B. Andersen,et al.  Information Science and Statistics , 1986 .

[16]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[17]  Gang Kou,et al.  An empirical study of classification algorithm evaluation for financial risk prediction , 2011, Appl. Soft Comput..

[18]  Benjamin Livshits,et al.  NOZZLE: A Defense Against Heap-spraying Code Injection Attacks , 2009, USENIX Security Symposium.

[19]  Stuart E. Middleton,et al.  Ontological user profiling in recommender systems , 2004, TOIS.

[20]  Harry Zhang,et al.  The Optimality of Naive Bayes , 2004, FLAIRS.

[21]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[22]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[23]  Christopher Ke,et al.  Analysis of the Australian web threat landscape , 2013 .

[24]  Rainer Böhme,et al.  Modeling Cyber-Insurance: Towards a Unifying Framework , 2010, WEIS.

[25]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[26]  G. Udny Yule,et al.  An introduction to the theory of statistics (4th ed.). , 2022 .

[27]  Christopher Krügel,et al.  Is the Internet for Porn? An Insight Into the Online Adult Industry , 2010, WEIS.