Finding changed and similar functions between a pair of binaries is an important problem in malware attribution and for the identification of new malware capabilities. This paper presents a new technique called Function Similarity using Family Context (FSFC) for this problem. FSFC trains a Support Vector Machine (SVM) model using pairs of similar functions from two program variants. This method improves upon previous research called Cross Version Contextual Function Similarity (CVCFS) e epresenting a function using features extracted not just from the function itself, but also, from other functions with which it has a caller and callee relationship. We present the results of an initial experiment that shows that the use of additional features from the context of a function significantly decreases the false positive rate, obviating the need for a separate pass for cleaning false positives. The more surprising and unexpected finding is that the SVM model produced by FSFC can abstract function similarity features from one pair of program variants to find similar functions in an unrelated pair of program variants. If validated by a larger study, this new property leads to the possibility of creating generic similar function classifiers that can be packaged and distributed in reverse engineering tools such as IDA Pro and Ghidra.
[1]
Saed Alrabaee,et al.
On the feasibility of binary authorship characterization
,
2019,
Digit. Investig..
[2]
Hong Lin,et al.
CVSkSA: cross-architecture vulnerability search in firmware based on kNN-SVM and attributed control flow graph
,
2019,
Software Quality Journal.
[3]
Jiwon Hong,et al.
Classifying malwares for identification of author groups
,
2018,
Concurr. Comput. Pract. Exp..
[4]
G. Ruxton,et al.
Effective use of Pearson's product–moment correlation coefficient
,
2014,
Animal Behaviour.
[5]
Nitesh V. Chawla,et al.
SMOTE: Synthetic Minority Over-sampling Technique
,
2002,
J. Artif. Intell. Res..
[6]
Iqbal Gondal,et al.
A survey of similarities in banking malware behaviours
,
2018,
Comput. Secur..
[7]
S. Shapiro,et al.
An Analysis of Variance Test for Normality (Complete Samples)
,
1965
.