Inferring User Actions from Provenance Logs

Progger, a kernel-spaced cloud data provenance logger which provides fine-grained data activity records, was recently developed to empower cloud stakeholders to trace data life cycles within and across clouds. Progger logs have the potential to allow analysts to infer user actions and create a data-centric behaviour history in a cloud computing environment. However, the Progger logs are complex and noisy and therefore, currently this potential can not be met. This paper proposes a statistical approach to efficiently infer the user actions from the Progger logs. Inferring logs which capture activities at kernel-level granularity is not a straightforward endeavour. This paper overcomes this challenge through an approach which shows a high level of accuracy. The key aspects of this approach are identifying the data preprocessing steps and attribute selection. We then use four standard classification models and identify the model which provides the most accurate inference on user actions. To our best knowledge, this is the first work of its kind. We also discuss a number of possible extensions to this work. Possible future applications include the ability to predict an anomalous security activity before it occurs.

[1]  Lois M. L. Delcambre,et al.  A Framework for Fine-grained Data Integration and Curation, with Provenance, in a Dataspace , 2009, Workshop on the Theory and Practice of Provenance.

[2]  Ryan K. L. Ko,et al.  Progger: An Efficient, Tamper-Evident Kernel-Space Logger for Cloud Data Provenance Tracking , 2014, 2014 IEEE 7th International Conference on Cloud Computing.

[3]  Bu-Sung Lee,et al.  Flogger: A File-Centric Logger for Monitoring File Access and Transfers within Cloud Computing Environments , 2011, 2011IEEE 10th International Conference on Trust, Security and Privacy in Computing and Communications.

[4]  Bu Sung Lee,et al.  From system-centric to data-centric logging - Accountability, trust & security in cloud computing , 2011, 2011 Defense Science Research Conference and Expo (DSR).

[5]  Eugene Santos,et al.  Learning and Predicting User Behavior for Particular Resource Use , 2001, FLAIRS Conference.

[6]  Bu-Sung Lee,et al.  S2Logger: End-to-End Data Tracking Mechanism for Cloud Data Provenance , 2013, 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications.

[7]  Michelle X. Zhou,et al.  Characterizing users’ visual analytic activity for insight provenance , 2008, 2008 IEEE Symposium on Visual Analytics Science and Technology.

[8]  Yi Lu,et al.  Mining Web Log Sequential Patterns with Position Coded Pre-Order Linked WAP-Tree , 2005, Data Mining and Knowledge Discovery.

[9]  Hajo Hippner,et al.  Text Mining , 2006, Informatik-Spektrum.

[10]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[11]  Brian N. Bershad,et al.  Why we search: visualizing and predicting user behavior , 2007, WWW '07.

[12]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[13]  Beth Plale,et al.  Temporal representation for scientific data provenance , 2012, 2012 IEEE 8th International Conference on E-Science.

[14]  Issa M. Khalil,et al.  Prediction of User's Web-Browsing Behavior: Application of Markov Model , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[15]  Mika Klemettinen,et al.  Applying data mining techniques for descriptive phrase extraction in digital document collections , 1998, Proceedings IEEE International Forum on Research and Technology Advances in Digital Libraries -ADL'98-.

[16]  Thomas Seidl,et al.  MOA: Massive Online Analysis, a Framework for Stream Classification and Clustering , 2010, WAPA.

[17]  Bu-Sung Lee,et al.  TrustCloud: A Framework for Accountability and Trust in Cloud Computing , 2011, 2011 IEEE World Congress on Services.

[18]  Margo I. Seltzer,et al.  BURRITO: Wrapping Your Lab Notebook in Computational Infrastructure , 2012, TaPP.

[19]  Susan T. Dumais,et al.  Modeling and predicting behavioral dynamics on the web , 2012, WWW.