Use of machine learning in big data analytics for insider threat detection

In current enterprise environments, information is becoming more readily accessible across a wide range of interconnected systems. However, trustworthiness of documents and actors is not explicitly measured, leaving actors unaware of how latest security events may have impacted the trustworthiness of the information being used and the actors involved. This leads to situations where information producers give documents to consumers they should not trust and consumers use information from non-reputable documents or producers. The concepts and technologies developed as part of the Behavior-Based Access Control (BBAC) effort strive to overcome these limitations by means of performing accurate calculations of trustworthiness of actors, e.g., behavior and usage patterns, as well as documents, e.g., provenance and workflow data dependencies. BBAC analyses a wide range of observables for mal-behavior, including network connections, HTTP requests, English text exchanges through emails or chat messages, and edit sequences to documents. The current prototype service strategically combines big data batch processing to train classifiers and real-time stream processing to classifier observed behaviors at multiple layers. To scale up to enterprise regimes, BBAC combines clustering analysis with statistical classification in a way that maintains an adjustable number of classifiers.

[1]  Rachel Greenstadt,et al.  Using Machine Learning for Behavior-Based Access Control: Scalable Anomaly Detection on TCP Connections and HTTP Requests , 2013, MILCOM 2013 - 2013 IEEE Military Communications Conference.

[2]  Rihards Olups,et al.  Zabbix 1.8 Network Monitoring , 2010 .

[3]  Leslie Daigle,et al.  WHOIS Protocol Specification , 2004, RFC.

[4]  Martin Roesch,et al.  Snort - Lightweight Intrusion Detection for Networks , 1999 .

[5]  Rachel Greenstadt,et al.  Assessing trustworthiness in collaborative environments , 2013, CSIIRW '13.

[6]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[7]  David E. Culler,et al.  The ganglia distributed monitoring system: design, implementation, and experience , 2004, Parallel Comput..

[8]  Salvatore J. Stolfo,et al.  A Network Access Control Mechanism Based on Behavior Profiles , 2009, 2009 Annual Computer Security Applications Conference.

[9]  Rachel Greenstadt,et al.  Problems and Mitigation Strategies for Developing and Validating Statistical Cyber Defenses , 2014 .

[10]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[11]  Vern Paxson,et al.  Outside the Closed World: On Using Machine Learning for Network Intrusion Detection , 2010, 2010 IEEE Symposium on Security and Privacy.

[12]  Malek Ben Salem,et al.  A Survey of Insider Attack Detection Research , 2008, Insider Attack and Cyber Security.

[13]  Hsinchun Chen,et al.  Writeprints: A stylometric approach to identity-level identification and similarity detection in cyberspace , 2008, TOIS.

[14]  Lipo Wang Support vector machines : theory and applications , 2005 .

[15]  Justin Tung Ma,et al.  Learning to detect malicious URLs , 2011, TIST.

[16]  Rachel Greenstadt,et al.  The illiterate editor: metadata-driven revert detection in Wikipedia , 2013, OpenSym.

[17]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[18]  Igor Brigadir,et al.  Real Time Event Monitoring with Trident , 2013 .

[19]  Aaron Adler,et al.  Scalable machine learning framework for behavior-based access control , 2013, 2013 6th International Symposium on Resilient Control Systems (ISRCS).

[20]  David R. Miller,et al.  Security Information and Event Management , 2010 .

[21]  Markus Jakobsson,et al.  Implicit Authentication through Learning User Behavior , 2010, ISC.

[22]  Jon Stearley,et al.  Bridging the Gaps: Joining Information Sources with Splunk , 2010, SLAML.

[23]  Vern Paxson,et al.  Bro: a system for detecting network intruders in real-time , 1998, Comput. Networks.

[24]  Aaron Adler,et al.  User Selection of Clusters and Classifiers in BBAC , 2013 .

[25]  Ariel Stolerman,et al.  Use Fewer Instances of the Letter "i": Toward Writing Style Anonymization , 2012, Privacy Enhancing Technologies.