Fuzzy Hashing on Firmwares Images: A Comparative Analysis

With the fast development of the Internet of Things (IoT) technology, there are more attacks against IoT devices, and IoT security issues have attracted considerable attention. Due to the widespread phenomenon that different IoT firmwares reuse the same code, code similarity comparison is an important technique for IoT security analysis. Fuzzy hashing generates fingerprints of files by converting them into intermediate expressions and hashing such expressions, evaluating the fingerprint similarity and thus evaluating the similarity of files that are not identical. In this article, we analyze and compare today’s most widely used fuzzy hashing tools, and classify them in detail. In addition, we also analyze the advantages and disadvantages of different algorithms used by these fuzzy hashing tools. Finally, we select a few of the most convincing fuzzy hashing tools, such as ssdeep and TLSH, for performance comparison by experimental analysis on real firmware datasets.

[1]  Jedediah Haile,et al.  Comparing Ransomware using TLSH and @DisCo Analysis Frameworks , 2021, 2021 IEEE International Conference on Big Data (Big Data).

[2]  Jonathan J. Oliver,et al.  Designing the Elements of a Fuzzy Hashing Scheme , 2021, 2021 IEEE 19th International Conference on Embedded and Ubiquitous Computing (EUC).

[3]  Muqeet Ali,et al.  Scalable Malware Clustering using Multi-Stage Tree Parallelization , 2020, 2020 IEEE International Conference on Intelligence and Security Informatics (ISI).

[4]  Jules Pagna Disso,et al.  Similarity hash based scoring of portable executable files for efficient malware detection in IoT , 2020, Future Gener. Comput. Syst..

[5]  Muqeet Ali,et al.  HAC-T and Fast Search for Similarity in Security , 2020, 2020 International Conference on Omni-layer Intelligent Systems (COINS).

[6]  Yuan Shen,et al.  Functions-based CFG Embedding for Malware Homology Analysis , 2019, 2019 26th International Conference on Telecommunications (ICT).

[7]  Yu Chen,et al.  IHB: A scalable and efficient scheme to identify homologous binaries in IoT firmwares , 2017, 2017 IEEE 36th International Performance Computing and Communications Conference (IPCCC).

[8]  Harald Baier,et al.  On the database lookup problem of approximate matching , 2014, Digit. Investig..

[9]  Harald Baier,et al.  Similarity Preserving Hashing: Eligible Properties and a New Algorithm MRSH-v2 , 2012, ICDF2C.

[10]  Vassil Roussev,et al.  Data Fingerprinting with Similarity Digests , 2010, IFIP Int. Conf. Digital Forensics.

[11]  Jesse D. Kornblum Identifying almost identical files using context triggered piecewise hashing , 2006, Digit. Investig..

[12]  Ricardo J. Rodríguez,et al.  Bringing order to approximate matching: Classification and attacks on similarity digest algorithms , 2021, Digit. Investig..

[13]  Dmitry Golubev,et al.  Efficient Text Processing via Context Triggered Piecewise Hashing Algorithm for Spam Detection , 2020, LOD.