Accelerating Dependency Graph Learning from Heterogeneous Categorical Event Streams via Knowledge Transfer

Dependency graph, as a heterogeneous graph representing the intrinsic relationships between different pairs of system entities, is essential to many data analysis applications, such as root cause diagnosis, intrusion detection, etc. Given a well-trained dependency graph from a source domain and an immature dependency graph from a target domain, how can we extract the entity and dependency knowledge from the source to enhance the target? One way is to directly apply a mature dependency graph learned from a source domain to the target domain. But due to the domain variety problem, directly using the source dependency graph often can not achieve good performance. Traditional transfer learning methods mainly focus on numerical data and are not applicable. In this paper, we propose ACRET, a knowledge transfer based model for accelerating dependency graph learning from heterogeneous categorical event streams. In particular, we first propose an entity estimation model to filter out irrelevant entities from the source domain based on entity embedding and manifold learning. Only the entities with statistically high correlations are transferred to the target domain. On the surviving entities, we propose a dependency construction model for constructing the unbiased dependency relationships by solving a two-constraint optimization problem. The experimental results on synthetic and real-world datasets demonstrate the effectiveness and efficiency of ACRET. We also apply ACRET to a real enterprise security system for intrusion detection. Our method is able to achieve superior detection performance at least 20 days lead lag time in advance with more than 70% accuracy.

[1]  Charu C. Aggarwal,et al.  Data Streams - Models and Algorithms , 2014, Advances in Database Systems.

[2]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[3]  Samuel T. King,et al.  Enriching Intrusion Alerts Through Multi-Host Causality , 2005, NDSS.

[4]  Qiang Yang,et al.  Boosting for transfer learning , 2007, ICML '07.

[5]  Jimmy J. Lin,et al.  Information network or social network?: the structure of the twitter follow graph , 2014, WWW.

[6]  Minghua Chen,et al.  Predicting positive and negative links in signed social networks by transfer learning , 2013, WWW.

[7]  Chengqi Zhang,et al.  TrGraph: Cross-Network Transfer Learning via Common Signature Subgraphs , 2015, IEEE Transactions on Knowledge and Data Engineering.

[8]  Yizhou Sun,et al.  Mining Heterogeneous Information Networks: Principles and Methodologies , 2012, Mining Heterogeneous Information Networks: Principles and Methodologies.

[9]  Sethuraman Panchanathan,et al.  Joint Transfer and Batch-mode Active Learning , 2013, ICML.

[10]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.

[11]  Leman Akoglu,et al.  Fast Memory-efficient Anomaly Detection in Streaming Heterogeneous Graphs , 2016, KDD.

[12]  David Liben-Nowell,et al.  The link-prediction problem for social networks , 2007 .

[13]  Jingrui He,et al.  Graph-based transfer learning , 2009, CIKM.

[14]  Jake M. Hofman,et al.  Prediction and explanation in social systems , 2017, Science.

[15]  Qiang Fu,et al.  Correlating events with time series for incident diagnosis , 2014, KDD.

[16]  Jimeng Sun,et al.  Relevance search and anomaly detection in bipartite graphs , 2005, SKDD.

[17]  Jie Yin,et al.  Transfer Learning across Networks for Collective Classification , 2013, 2013 IEEE 13th International Conference on Data Mining.

[18]  Wei Pang,et al.  Hete-CF: Social-Based Collaborative Filtering Recommendation Using Heterogeneous Relations , 2014, 2014 IEEE International Conference on Data Mining.

[19]  Jure Leskovec,et al.  Modeling Social Networks with Node Attributes using the Multiplicative Attribute Graph Model , 2011, UAI.

[20]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[21]  Jimeng Sun,et al.  Fast Random Walk Graph Kernel , 2012, SDM.

[22]  Ivan Marsic,et al.  From Categorical to Numerical: Multiple Transitive Distance Learning and Embedding , 2015, SDM.

[23]  Kathryn Fraughnaugh,et al.  Introduction to graph theory , 1973, Mathematical Gazette.

[24]  Geoffrey J. Gordon,et al.  Relational learning via collective matrix factorization , 2008, KDD.

[25]  Yizhou Sun,et al.  Task-Guided and Path-Augmented Heterogeneous Network Embedding for Author Identification , 2016, WSDM.

[26]  Miao He,et al.  A Dependency Graph Approach for Fault Detection and Localization Towards Secure Smart Grid , 2011, IEEE Transactions on Smart Grid.

[27]  Qiang Yang,et al.  Transfer Learning for Collective Link Prediction in Multiple Heterogenous Domains , 2010, ICML.

[28]  Chen Luo,et al.  HetPathMine: A Novel Transductive Classification Algorithm on Heterogeneous Information Networks , 2014, ECIR.

[29]  Anita K. Jones,et al.  Computer System Intrusion Detection: A Survey , 2000 .

[30]  Nagiza F. Samatova,et al.  A graph‐based approach to find teleconnections in climate data , 2013, Stat. Anal. Data Min..

[31]  Ronald L. Krutz,et al.  Cloud Security: A Comprehensive Guide to Secure Cloud Computing , 2010 .

[32]  Tianqi Chen,et al.  Net2Net: Accelerating Learning via Knowledge Transfer , 2015, ICLR.

[33]  Fengyuan Xu,et al.  High Fidelity Data Reduction for Big Data Security Dependency Analyses , 2016, CCS.

[34]  Anmol Bhasin,et al.  Transfer Learning for Bilingual Content Classification , 2015, KDD.