Network and other computer administrators typically have access to a rich set of logs tracking actions by users. However, they often lack metadata such as user role, age, and gender that can provide valuable context for users’ actions. Inferring user attributes automatically has wide ranging implications; among others, for customization (anticipating user needs and priorities), for managing resources (anticipating demand) and for security (interpreting anomalous behavior). User attribute profiling has been most widely explored in the context of social media, using lexical and social graph data. Ikeda et al. (2013) use a hybrid system which predicts user demographic information by a clustering of the user’s followers/followees in conjunction with a support vector machine which analyzes the tweet history of a user. Oentaryo et al. (2016) use a generalization of logistic regression with social network relationship data in order to predict user attributes such as age, sex, and religion. Other researchers have considered user attribute profiling in network and cybersecurity domains. Iglesias (2012) presents an unsupervised rule based classifier that profiles users into learned and adaptive categories based on sequences of Unix shell commands. Meng et al. (2008) learn a mixture model over behavior templates for sequences of network activity. Panda and Patra (2008) explore the effectiveness of several decision tree classifiers and Naive Bayes at classifying a network intrusion from sequences of network events. Udoeyop (2010) uses kmeans and kernel density estimation to detect normal and abnormal user profiles from fine-grained user network activities, including processes, process times, and file reads, writes and opens. In the ongoing work described in this abstract, we predict the role of users from computer logs, although our method trivially generalizes to predict other categorical attributes. We use recurrent neural networks to make increasingly refined predictions over time, achieving a 30% relative reduction in role classification error over our baseline.
[1]
G. G. Stokes.
"J."
,
1890,
The New Yale Book of Quotations.
[2]
David Lo,et al.
Collective Semi-Supervised Learning for User Profiling in Social Media
,
2016,
ArXiv.
[3]
Akaninyene Walter Udoeyop,et al.
Cyber Profiling for Insider Threat Detection
,
2010
.
[4]
Teruo Higashino,et al.
Twitter user profiling based on text and community mining for market analysis
,
2013,
Knowl. Based Syst..
[5]
Plamen P. Angelov,et al.
Creating Evolving User Behavior Profiles Automatically
,
2012,
IEEE Transactions on Knowledge and Data Engineering.
[6]
Manas Ranjan Patra,et al.
A Comparative Study of Data Mining Algorithms for Network Intrusion Detection
,
2008,
2008 First International Conference on Emerging Trends in Engineering and Technology.
[7]
Joshua Glasser,et al.
Bridging the Gap: A Pragmatic Approach to Generating Insider Threat Data
,
2013,
2013 IEEE Security and Privacy Workshops.
[8]
Jürgen Schmidhuber,et al.
Long Short-Term Memory
,
1997,
Neural Computation.
[9]
Haifeng Chen,et al.
Automatic Profiling of Network Event Sequences: Algorithm and Applications
,
2008,
IEEE INFOCOM 2008 - The 27th Conference on Computer Communications.