Random Forest Classification for Detecting Android Malware

Internet connected smartphone devices play a crucial role in the application domain of Internet of Things. These devices are being widely used for day-to-day activities such as remotely controlling lighting and heating at homes, paying for parking, and recently for paying for goods using saved credit card information using Near Field Communication (NFC). Android is the most popular smartphone platform today. It is also the choice of malware authors to obtain secure and private data. In this paper we exclusively apply the machine learning ensemble learning algorithm Random Forest supervised classifier on an Android feature dataset of 48919 points of 42 features each. Our goal was to measure the accuracy of Random Forest in classifying Android application behavior to classify applications as malicious or benign. Moreover, we wanted to focus on detection accuracy as the free parameters of the Random Forest algorithm such as the number of trees, depth of each tree and number of random features selected are varied. Our experimental results based on 5-fold cross validation of our dataset shows that Random Forest performs very well with an accuracy of over 99 percent in general, an optimal Out-Of-Bag (OOB) error rate [3] of 0.0002 for forests with 40 trees or more, and a root mean squared error of 0.0171 for 160 trees.