A Grey-Box Approach for Detecting Malicious User Interactions in Web Applications

Web applications are the core enabler for most Internet services today. Their standard interfaces allow them to be composed together in different ways in order to support different service workflows. While the modular composition of applications has considerably simplified the provisioning of new Internet services, it has also added new security challenges; the impact of a security breach propagating through the chain far beyond the vulnerable application. To secure web applications, two distinct approaches have been commonly used in the literature. First, white-box approaches leverage the source code in order to detect and fix unintended flaws. Although they cover well the intrinsic flaws within each application, they can barely leverage logic flaws that arise when connecting multiple applications within the same service. On the other hand, black-box approaches analyze the workflow of a service through a set of user interactions, while assuming only little information about its embedded applications. These approaches may have a better coverage, but suffer from a high false positives rate. So far, to the best of our knowledge, there is not yet a single solution that combines both approaches into a common framework. In this paper, we present a new grey-box approach that leverages the advantages of both white-box and black-box. The core component of our system is a semi-supervised learning framework that first learns the nominal behavior of the service using a set of elementary user interactions, and then prune this nominal behavior from attacks that may have occurred during the learning phase. To do so, we leverage a graph-based representation of known attack scenarios that is built using a white-box approach. We demonstrate in this paper the use of our system through a practical use case, including real world attack scenarios that we were able to detect and qualify using our approach.

[1]  Alessandro Armando,et al.  From Model-Checking to Automated Testing of Security Protocols: Bridging the Gap , 2012, TAP@TOOLS.

[2]  Engin Kirda,et al.  Have things changed now? An empirical study on input validation vulnerabilities in web applications , 2012, Comput. Secur..

[3]  Alessandro Armando,et al.  Attack Patterns for Black-Box Security Testing of Multi-Party Web Applications , 2016, NDSS.

[4]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[5]  Sriram K. Rajamani,et al.  The SLAM project: debugging system software via static analysis , 2002, POPL '02.

[6]  Barbara Carminati,et al.  Web Service Composition: A Security Perspective , 2005, International Workshop on Challenges in Web Information Retrieval and Integration.

[7]  Jun Sun,et al.  AUTHSCAN: Automatic Extraction of Web Authentication Protocols from Implementations , 2013, NDSS.

[8]  Christopher Krügel,et al.  ZigZag: Automatically Hardening Web Applications Against Client-side Validation Vulnerabilities , 2015, USENIX Security Symposium.

[9]  Christopher Krügel,et al.  Toward Automated Detection of Logic Vulnerabilities in Web Applications , 2010, USENIX Security Symposium.

[10]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[11]  Sushil Jajodia,et al.  Correlating intrusion events and building attack scenarios through attack graph distances , 2004, 20th Annual Computer Security Applications Conference.

[12]  Christopher Krügel,et al.  Cross Site Scripting Prevention with Dynamic Data Tainting and Static Analysis , 2007, NDSS.

[13]  Samuel Marchal,et al.  Know Your Phish: Novel Techniques for Detecting Phishing Sites and Their Targets , 2015, 2016 IEEE 36th International Conference on Distributed Computing Systems (ICDCS).

[14]  Eugene M. Luks,et al.  Isomorphism of graphs of bounded valence can be tested in polynomial time , 1980, 21st Annual Symposium on Foundations of Computer Science (sfcs 1980).

[15]  Dawson R. Engler,et al.  Execution Generated Test Cases: How to Make Systems Code Crash Itself , 2005, SPIN.

[16]  Ke Wang,et al.  Fileprints: identifying file types by n-gram analysis , 2005, Proceedings from the Sixth Annual IEEE SMC Information Assurance Workshop.

[17]  Christopher Krügel,et al.  Enemy of the State: A State-Aware Black-Box Web Vulnerability Scanner , 2012, USENIX Security Symposium.

[18]  Ralf Küsters,et al.  SPRESSO: A Secure, Privacy-Respecting Single Sign-On System for the Web , 2015, CCS.

[19]  Giovanni Vigna,et al.  Swaddler: An Approach for the Anomaly-Based Detection of State Violations in Web Applications , 2007, RAID.

[20]  Roberto Perdisci,et al.  WebWitness: Investigating, Categorizing, and Mitigating Malware Download Paths , 2015, USENIX Security Symposium.

[21]  John C. Mitchell,et al.  State of the Art: Automated Black-Box Web Application Vulnerability Testing , 2010, 2010 IEEE Symposium on Security and Privacy.

[22]  Gianluca Stringhini,et al.  EVILCOHORT: Detecting Communities of Malicious Accounts on Online Services , 2015, USENIX Security Symposium.

[23]  Xiang Fu,et al.  A Static Analysis Framework For Detecting SQL Injection Vulnerabilities , 2007, 31st Annual International Computer Software and Applications Conference (COMPSAC 2007).

[24]  Steve Hanna,et al.  A Symbolic Execution Framework for JavaScript , 2010, 2010 IEEE Symposium on Security and Privacy.

[25]  Julian R. Ullmann,et al.  An Algorithm for Subgraph Isomorphism , 1976, J. ACM.

[26]  Kalyan Veeramachaneni,et al.  AI^2: Training a Big Data Machine to Defend , 2016, 2016 IEEE 2nd International Conference on Big Data Security on Cloud (BigDataSecurity), IEEE International Conference on High Performance and Smart Computing (HPSC), and IEEE International Conference on Intelligent Data and Security (IDS).

[27]  Inderjit S. Dhillon,et al.  Information-theoretic co-clustering , 2003, KDD '03.

[28]  Xiaowei Li,et al.  BLOCK: a black-box approach for detection of state violation attacks towards web applications , 2011, ACSAC '11.

[29]  Pavel Berkhin,et al.  A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[30]  Patrice Godefroid,et al.  Automated Whitebox Fuzz Testing , 2008, NDSS.

[31]  W. B. Cavnar,et al.  N-gram-based text categorization , 1994 .

[32]  Vlado Keselj,et al.  N-gram-based detection of new malicious code , 2004, Proceedings of the 28th Annual International Computer Software and Applications Conference, 2004. COMPSAC 2004..

[33]  Giovanni Vigna,et al.  Multi-module vulnerability analysis of web-based applications , 2007, CCS '07.

[34]  Davide Balzarotti,et al.  Toward Black-Box Detection of Logic Flaws in Web Applications , 2014, NDSS.

[35]  Aurélien Francillon,et al.  The role of web hosting providers in detecting compromised websites , 2013, WWW '13.

[36]  Mario Vento,et al.  A (sub)graph isomorphism algorithm for matching large graphs , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.