论文信息 - Rebuilding the Tower of Babel: Towards Cross-System Malware Information Sharing

Rebuilding the Tower of Babel: Towards Cross-System Malware Information Sharing

Anti-virus systems developed by different vendors often demonstrate strong discrepancies in how they name malware, which signficantly hinders malware information sharing. While existing work has proposed a plethora of malware naming standards, most anti-virus vendors were reluctant to change their own naming conventions. In this paper we explore a new, more pragmatic alternative. We propose to exploit the correlation between malware naming of different anti-virus systems to create their consensus classification, through which these systems can share malware information without modifying their naming conventions. Specifically we present Latin, a novel classification integration framework leveraging the correspondence between participating anti-virus systems as reflected in heterogeneous information sources at instance-instance, instance-name, and name-name levels. We provide results from extensive experimental studies using real malware datasets and concrete use cases to verify the efficacy of Latin in supporting cross-system malware information sharing.

Wei Gao | Shicong Meng | Ting Wang | Xin Hu

[1] Jianhua Lin,et al. Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[2] William W. Cohen,et al. Power Iteration Clustering , 2010, ICML.

[3] Carsten Willems,et al. Automatic analysis of malware behavior using machine learning , 2011, J. Comput. Secur..

[4] Tom Kelchner. The (in)consistent naming of malcode , 2010 .

[5] Edwin R. Hancock,et al. Spectral Clustering of Graphs , 2003, GbRPR.

[6] Zhuoqing Morley Mao,et al. Automated Classification and Analysis of Internet Malware , 2007, RAID.

[7] Vincent Kanade,et al. Clustering Algorithms , 2021, Wireless RF Energy Transfer in the Massive IoT Era.

[8] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[9] Erhard Rahm,et al. Similarity flooding: a versatile graph matching algorithm and its application to schema matching , 2002, Proceedings 18th International Conference on Data Engineering.

[10] Rajeev Motwani,et al. Robust and efficient fuzzy match for online data cleaning , 2003, SIGMOD '03.

[11] Philip S. Yu,et al. Combining multiple clusterings by soft correspondence , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[12] David Harley,et al. A DOSE BY ANY OTHER NAME , 2008 .

[13] J L Marx,et al. A virus by any other name . . . , 1985, Science.

[14] Erhard Rahm,et al. Generic Schema Matching with Cupid , 2001, VLDB.

[15] Somesh Jha,et al. A semantics-based approach to malware detection , 2007, POPL '07.

[16] Pedro M. Domingos,et al. Reconciling schemas of disparate data sources: a machine-learning approach , 2001, SIGMOD '01.

[17] Howard B. Newcombe,et al. Record linkage: making maximum use of the discriminating power of identifying information , 1962, CACM.

[18] Ting Wang,et al. SeMap: a generic mapping construction system , 2008, EDBT '08.

[19] Yong Chen,et al. Automatic malware categorization using cluster ensemble , 2010, KDD.

[20] Stefano Zanero,et al. Finding Non-trivial Malware Naming Inconsistencies , 2011, ICISS.

[21] Erhard Rahm,et al. Generic schema matching, ten years later , 2011, Proc. VLDB Endow..

[22] Fausto Giunchiglia,et al. Semantic Matching: Algorithms and Implementation , 2007, J. Data Semant..