Heterogeneous Temporal Graph Transformer: An Intelligent System for Evolving Android Malware Detection

The explosive growth and increasing sophistication of Android malware call for new defensive techniques to protect mobile users against novel threats. To address this challenge, in this paper, we propose and develop an intelligent system named Dr.Droid to jointly model malware propagation and evolution for their detection at the first attempt. In Dr.Droid, we first exploit higher-level semantic and social relations within the ecosystem (e.g., app-market, app-developer, market-developer relations etc.) to characterize app propagation patterns; and then we present a structured heterogeneous graph to model the complex relations among different types of entities. To capture malware evolution, we further consider the temporal dependence and introduce a heterogeneous temporal graph to jointly model malware propagation and evolution by considering heterogeneous spatial dependencies with temporal dimensions. Afterwards, we propose a novel heterogeneous temporal graph transformer framework (denoted as HTGT) to integrate both spatial and temporal dependencies while preserving the heterogeneity to learn node representations for malware detection. Specifically, in our proposed HTGT, to preserve the heterogeneity, we devise a heterogeneous spatial transformer to derive heterogeneous attentions over each node and edge to learn dedicated representations for different types of entities and relations; to model temporal dependencies, we design a temporal transformer into the HTGT to attentively aggregate its historical sequences of a given node (e.g., app); the two transformers work in an iterative manner for representation learning. Promising experimental results based on the large-scale sample collections from anti-malware industry demonstrate the performance of Dr.Droid, by comparison with state-of-the-art baselines and popular mobile security products.

[1]  Yanfang Ye,et al.  Heterogeneous Graph Attention Network , 2019, WWW.

[2]  Manuel Kaufmann,et al.  A Spatio-temporal Transformer for 3D Human Motion Prediction , 2020, 2021 International Conference on 3D Vision (3DV).

[3]  Yizhou Sun,et al.  Heterogeneous Graph Transformer , 2020, WWW.

[4]  Yanfang Ye,et al.  HinDroid: An Intelligent Android Malware Detection System Based on Structured Heterogeneous Information Network , 2017, KDD.

[5]  Xin Li,et al.  DeepAM: a heterogeneous deep learning framework for intelligent malware detection , 2018, Knowledge and Information Systems.

[6]  Jing Jiang,et al.  Graph WaveNet for Deep Spatial-Temporal Graph Modeling , 2019, IJCAI.

[7]  Yanfang Ye,et al.  Out-of-sample Node Representation Learning for Heterogeneous Graph in Real-time Android Malware Detection , 2019, IJCAI.

[8]  Jing Wang,et al.  GraphSleepNet: Adaptive Spatial-Temporal Graph Convolutional Networks for Sleep Stage Classification , 2020, IJCAI.

[9]  Yiyue Qian,et al.  $\alpha$-Satellite: An AI-Driven System and Benchmark Datasets for Dynamic COVID-19 Risk Assessment in the United States , 2020, IEEE Journal of Biomedical and Health Informatics.

[10]  Yanfang Ye,et al.  Deep4MalDroid: A Deep Learning Framework for Android Malware Detection Based on Linux Kernel System Call Graphs , 2016, 2016 IEEE/WIC/ACM International Conference on Web Intelligence Workshops (WIW).

[11]  Haipeng Cai,et al.  DroidCat: Effective Android Malware Detection and Categorization via App-Level Profiling , 2019, IEEE Transactions on Information Forensics and Security.

[12]  Zhanxing Zhu,et al.  Spatio-temporal Graph Convolutional Neural Network: A Deep Learning Framework for Traffic Forecasting , 2017, IJCAI.

[13]  Wen Jiang,et al.  Dynamic Heterogeneous Graph Embedding Using Hierarchical Attentions , 2020, ECIR.

[14]  Xiaoyang Wang,et al.  Traffic Flow Prediction via Spatial Temporal Graph Neural Network , 2020, WWW.

[15]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[16]  Ning Feng,et al.  Attention Based Spatial-Temporal Graph Convolutional Networks for Traffic Flow Forecasting , 2019, AAAI.

[17]  Luyang Liu,et al.  Examining COVID-19 Forecasting using Spatio-Temporal Graph Neural Networks , 2020, ArXiv.

[18]  Weiyao Lin,et al.  Spatial-Temporal Transformer Networks for Traffic Flow Forecasting , 2020, ArXiv.

[19]  Yujie Fan,et al.  Disentangled Representation Learning in Heterogeneous Information Network for Large-scale Android Malware Detection in the COVID-19 Era and Beyond , 2021, AAAI.

[20]  Shuai Yi,et al.  Spatio-Temporal Graph Transformer Networks for Pedestrian Trajectory Prediction , 2020, ECCV.

[21]  Yi Hu,et al.  Modeling Dynamic Heterogeneous Network for Link Prediction using Hierarchical Attention with Temporal RNN , 2020, ArXiv.

[22]  S. Sitharama Iyengar,et al.  A Survey on Malware Detection Using Data Mining Techniques , 2017, ACM Comput. Surv..

[23]  Dafang Zhang,et al.  Dynamic Spatial-Temporal Graph Convolutional Neural Networks for Traffic Forecasting , 2019, AAAI.

[24]  Yinhai Wang,et al.  Traffic Graph Convolutional Recurrent Neural Network: A Deep Learning Framework for Network-Scale Traffic Learning and Forecasting , 2018, IEEE Transactions on Intelligent Transportation Systems.

[25]  Yu Sun,et al.  ERNIE: Enhanced Representation through Knowledge Integration , 2019, ArXiv.

[26]  Eul Gyu Im,et al.  A Multimodal Deep Learning Method for Android Malware Detection Using Various Features , 2019, IEEE Transactions on Information Forensics and Security.

[27]  Zheng Wang,et al.  HetETA: Heterogeneous Information Network Embedding for Estimating Time of Arrival , 2020, KDD.

[28]  Han Zhang,et al.  Dynamic Heterogeneous Graph Neural Network for Real-time Event Prediction , 2020, KDD.

[29]  Marius Leordeanu,et al.  Recurrent Space-time Graph Neural Networks , 2019, NeurIPS.

[30]  Philip S. Yu,et al.  PathSim , 2011 .

[31]  Matteo Matteucci,et al.  Spatial Temporal Transformer Network for Skeleton-based Action Recognition , 2020, ICPR Workshops.

[32]  Max Welling,et al.  Modeling Relational Data with Graph Convolutional Networks , 2017, ESWC.