GraphEvolveDroid: Mitigate Model Degradation in the Scenario of Android Ecosystem Evolution

Machine learning-based Android malware detection models suffer from model degradation over time due to ecosystem evolution, which means models trained on history data perform poorly on newly arrived data. Existing solutions to handle the above problem focus on sophisticated feature engineering to find stable features, which is labor-intensive. In this paper, we try to mitigate model degradation by substituting the representation paradigm from Euclidean (vector) to non-Euclidean (graph) without changing features and propose a graph-based Android malware detection model called GraphEvolveDroid. Specifically, we first construct a directed evolutionary network with the KNN model, where each node represents an APP and the starting APP node of each edge is the ancestor of the ending APP node. Then we use stacked GCN layers to transmit the information of ancestor nodes to child nodes so that the shift of the distribution of child nodes can be suppressed. Experimental results on a large real dataset spanning three years demonstrate that GraphEvolveDroid could significantly mitigate model degradation because of slowing down the shift of data distribution.