Statistical Estimation of Malware Detection Metrics in the Absence of Ground Truth

The accurate measurement of security metrics is a critical research problem, because an improper or inaccurate measurement process can ruin the usefulness of the metrics. This is a highly challenging problem, particularly when the ground truth is unknown or noisy. In this paper, we measure five malware detection metrics in the absence of ground truth, which is a realistic setting that imposes many technical challenges. The ultimate goal is to develop principled, automated methods for measuring these metrics at the maximum accuracy possible. The problem naturally calls for investigations into statistical estimators by casting the measurement problem as a statistical estimation problem. We propose statistical estimators for these five malware detection metrics. By investigating the statistical properties of these estimators, we characterize when the estimators are accurate, and what adjustments can be made to improve them under what circumstances. We use synthetic data with known ground truth to validate these statistical estimators. Then, we employ these estimators to measure five metrics with respect to a large data set collected from VirusTotal.

[1]  T. H. Bryant,et al.  SOCIETIES AND ACADEMIES. , 1895 .

[2]  G. Casella,et al.  Statistical Inference , 2003, Encyclopedia of Social Network Analysis and Mining.

[3]  Shouhuai Xu Analyzing Malware Detection Efficiency with Multiple Anti-Malware Programs , 2012 .

[4]  Pramod K. Varshney,et al.  Distributed Detection and Data Fusion , 1996 .

[5]  Samuel Kounev,et al.  Evaluating Computer Intrusion Detection Systems , 2015, ACM Comput. Surv..

[6]  Marcus Pendleton,et al.  A Survey on Systems Security Metrics , 2016, ACM Comput. Surv..

[7]  Paolo Milani Comparetti,et al.  EvilSeed: A Guided Approach to Finding Malicious Web Pages , 2012, 2012 IEEE Symposium on Security and Privacy.

[8]  Gerardo Hermosillo,et al.  Supervised learning from multiple experts: whom to trust when everyone lies a bit , 2009, ICML '09.

[9]  Alvaro A. Cárdenas,et al.  Principled reasoning and practical applications of alert fusion in intrusion detection systems , 2008, ASIACCS '08.

[10]  José Alberto Hernández,et al.  POSTER: Insights of Antivirus Relationships when Detecting Android Malware: A Data Analytics Approach , 2016, CCS.

[11]  Juan Caballero,et al.  AVclass: A Tool for Massive Malware Labeling , 2016, RAID.

[12]  Shouhuai Xu,et al.  Cross-layer detection of malicious websites , 2013, CODASPY.

[13]  Jacques Klein,et al.  On the Lack of Consensus in Anti-Virus Decisions: Metrics and Insights on Building Ground Truths of Android Malware , 2016, DIMVA.

[14]  Jian Peng,et al.  Variational Inference for Crowdsourcing , 2012, NIPS.

[15]  Javier R. Movellan,et al.  Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise , 2009, NIPS.

[16]  Subramanian Ramanathan,et al.  Evaluating Crowdsourcing Participants in the Absence of Ground-Truth , 2016, ArXiv.

[17]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[18]  Mingyan Liu,et al.  On the Mismanagement and Maliciousness of Networks , 2014, NDSS.

[19]  Michael Carl Tschantz,et al.  Better Malware Ground Truth: Techniques for Weighting Anti-Virus Vendor Labels , 2015, AISec@CCS.

[20]  Aziz Mohaisen,et al.  AV-Meter: An Evaluation of Antivirus Scans and Labels , 2014, DIMVA.

[21]  Christian Rossow,et al.  RUHR-UNIVERSITÄT BOCHUM , 2014 .

[22]  Pushpak Bhattacharyya,et al.  A model for handling approximate, noisy or incomplete labeling in text classification , 2005, ICML.

[23]  Roberto Perdisci,et al.  VAMO: towards a fully automated malware clustering validity analysis , 2012, ACSAC '12.