PANDORA: A Scalable and Efficient Scheme to Extract Version of Binaries in IoT Firmwares

Open source components are widely used by IoT vendors to develop firmwares in devices. The exposure of vulnerabilities existing in some specific versions of the core components may cause severe security incidents such as the Heartbleed event in 2014 and the Sambacry event in 2016. Extracting the version information from various firmware binaries is significant for evaluating the influence of such incidents and providing emergency response services. To the best of our knowledge, there are still no scalable and efficient extraction methods for binary version information in IoT firmwares. The commonly used method for traditional softwares requires the running up of the firmwares and interaction such as '-version' to obtain the version information. This method is not applicable for IoT devices, as they are built from various platforms which makes it impossible to simulate all of the interested firmwares at large scale. In this paper, we design, implement and evaluate a scalable and efficient binary version extraction framework (termed as PANDORA) for IoT firmwares, which does not rely on the real runtime environment. The main idea of our methodology is to leverage version strings in binaries to get version information. We design a string recover engine (SRE) to recover the missing pieces of those incomplete version strings. We test PANDORA in a dataset containing 2683 IoT binary files. Surprisingly 2267 of them are version- extractable and the recognition rate can reach 84.5%

[1]  Blake Anderson,et al.  OS fingerprinting: New techniques and a study of information gain and obfuscation , 2017, 2017 IEEE Conference on Communications and Network Security (CNS).

[2]  Yuval Elovici,et al.  Unknown Malcode Detection Using OPCODE Representation , 2008, EuroISI.

[3]  Aurélien Francillon,et al.  A Large-Scale Analysis of the Security of Embedded Firmwares , 2014, USENIX Security Symposium.

[4]  Barton P. Miller,et al.  Learning to Analyze Binary Computer Code , 2008, AAAI.

[5]  Zhenkai Liang,et al.  Neural Nets Can Learn Function Type Signatures From Binaries , 2017, USENIX Security Symposium.

[6]  Vlado Keselj,et al.  N-gram-based detection of new malicious code , 2004, Proceedings of the 28th Annual International Computer Software and Applications Conference, 2004. COMPSAC 2004..

[7]  Christopher Krügel,et al.  BOOMERANG: Exploiting the Semantic Gap in Trusted Execution Environments , 2017, NDSS.

[8]  Luca Bruno,et al.  AVATAR: A Framework to Support Dynamic Security Analysis of Embedded Systems' Firmwares , 2014, NDSS.

[9]  Apostolis Zarras,et al.  Automated Dynamic Firmware Analysis at Scale: A Case Study on Embedded Web Interfaces , 2015, AsiaCCS.

[10]  David Brumley,et al.  Towards Automated Dynamic Analysis for Linux-based Embedded Firmware , 2016, NDSS.

[11]  Christopher Krügel,et al.  SOK: (State of) The Art of War: Offensive Techniques in Binary Analysis , 2016, 2016 IEEE Symposium on Security and Privacy (SP).

[12]  Karl Trygve Kalleberg,et al.  Finding software license violations through binary code clone detection , 2011, MSR '11.

[13]  Dafydd Stuttard,et al.  The Web Application Hacker's Handbook: Discovering and Exploiting Security Flaws , 2007 .

[14]  Vern Paxson,et al.  The Matter of Heartbleed , 2014, Internet Measurement Conference.

[15]  David Brumley,et al.  BYTEWEIGHT: Learning to Recognize Functions in Binary Code , 2014, USENIX Security Symposium.

[16]  Carsten Willems,et al.  Learning and Classification of Malware Behavior , 2008, DIMVA.