Backdoor detection systems for embedded devices

A system is said to contain a backdoor when it intentionally includes a means to trigger the execution of functionality that serves to subvert its expected security. Unfortunately, such constructs are pervasive in software and systems today, particularly in the firmware of commodity embedded systems and “Internet of Things” devices. The work presented in this thesis concerns itself with the problem of detecting backdoor-like constructs, specifically those present in embedded device firmware, which, as we show, presents additional challenges in devising detection methodologies. The term “backdoor”, while used throughout the academic literature, by industry, and in the media, lacks a rigorous definition, which exacerbates the challenges in their detection. To this end, we present such a definition, as well as a framework, which serves as a basis for their discovery, devising new detection techniques and evaluating the current state-of-the-art. Further, we present two backdoor detection methodologies, as well as corresponding tools which implement those approaches. Both of these methods serve to automate many of the currently manual aspects of backdoor identification and discovery. And, in both cases, we demonstrate that our approaches are capable of analysing device firmware at scale and can be used to discover previously undocumented real-world backdoors.

[1]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[2]  David Brumley,et al.  BYTEWEIGHT: Learning to Recognize Functions in Binary Code , 2014, USENIX Security Symposium.

[3]  Heng Yin,et al.  Scalable Graph-based Bug Search for Firmware Images , 2016, CCS.

[4]  Christian Rossow,et al.  Cross-Architecture Bug Search in Binary Executables , 2015, 2015 IEEE Symposium on Security and Privacy.

[5]  Sattar Hashemi,et al.  Malware detection based on mining API calls , 2010, SAC '10.

[6]  Carsten Willems,et al.  Automatic analysis of malware behavior using machine learning , 2011, J. Comput. Secur..

[7]  Yoseba K. Penya,et al.  N-grams-based File Signatures for Malware Detection , 2009, ICEIS.

[8]  Thomas Dullien,et al.  Weird Machines, Exploitability, and Provable Unexploitability , 2020, IEEE Transactions on Emerging Topics in Computing.

[9]  Atul Prakash,et al.  Expose: Discovering Potential Binary Code Re-use , 2013, 2013 IEEE 37th Annual Computer Software and Applications Conference.

[10]  Ahmad-Reza Sadeghi,et al.  Counterfeit Object-oriented Programming: On the Difficulty of Preventing Code Reuse Attacks in C++ Applications , 2015, 2015 IEEE Symposium on Security and Privacy.

[11]  Somesh Jha,et al.  FIE on Firmware: Finding Vulnerabilities in Embedded Systems Using Symbolic Execution , 2013, USENIX Security Symposium.

[12]  Debin Gao,et al.  Software Watermarking using Return-Oriented Programming , 2015, AsiaCCS.

[13]  Somesh Jha,et al.  Synthesizing Near-Optimal Malware Specifications from Suspicious Behaviors , 2010, 2010 IEEE Symposium on Security and Privacy.

[14]  Sergey Bratus,et al.  Interrupt-oriented bugdoor programming: a minimalist approach to bugdooring embedded systems firmware , 2014, ACSAC '14.

[15]  Apostolis Zarras,et al.  Towards Automated Classification of Firmware Images and Identification of Embedded Devices , 2017, SEC.

[16]  Thorsten Holz,et al.  Towards reducing the attack surface of software backdoors , 2013, CCS.

[17]  Pat Langley,et al.  Induction of One-Level Decision Trees , 1992, ML.

[18]  Christopher Krügel,et al.  Firmalice - Automatic Detection of Authentication Bypass Vulnerabilities in Binary Firmware , 2015, NDSS.

[19]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[20]  Brendan Dolan-Gavitt,et al.  Repeatable Reverse Engineering with PANDA , 2015, PPREW@ACSAC.

[21]  Yuval Elovici,et al.  Unknown malcode detection and the imbalance problem , 2009, Journal in Computer Virology.

[22]  Xiao Zhao,et al.  The connected-component labeling problem: A review of state-of-the-art algorithms , 2017, Pattern Recognit..

[23]  Christian Rossow,et al.  Leveraging semantic signatures for bug search in binary programs , 2014, ACSAC.

[24]  Aurélien Francillon,et al.  Implementation and implications of a stealth hard-drive backdoor , 2013, ACSAC.

[25]  Hovav Shacham,et al.  Return-Oriented Programming: Systems, Languages, and Applications , 2012, TSEC.

[26]  Julien Vanegue The Weird Machines in Proof-Carrying Code , 2014, 2014 IEEE Security and Privacy Workshops.

[27]  B. S. Manjunath,et al.  Malware images: visualization and automatic classification , 2011, VizSec '11.

[28]  Sharad Malik,et al.  Verifying information flow properties of firmware using symbolic execution , 2016, 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[29]  Andrei Costin,et al.  Security of CCTV and Video Surveillance Systems: Threats, Vulnerabilities, Attacks, and Mitigations , 2016, TrustED@CCS.

[30]  Yoseba K. Penya,et al.  Idea: Opcode-Sequence-Based Malware Detection , 2010, ESSoS.

[31]  Sergey Bratus,et al.  Exploiting the Hard-Working DWARF: Trojan and Exploit Techniques with No Native Executable Code , 2011, WOOT.

[32]  Dawson R. Engler,et al.  KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs , 2008, OSDI.

[33]  Eibe Frank,et al.  Speeding Up Logistic Model Tree Induction , 2005, PKDD.

[34]  Herbert Bos,et al.  VUzzer: Application-aware Evolutionary Fuzzing , 2017, NDSS.

[35]  Kevin Fu,et al.  Pacemakers and Implantable Cardiac Defibrillators: Software Radio Attacks and Zero-Power Defenses , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[36]  Douglas S. Reeves,et al.  Fast malware classification by automated behavioral graph matching , 2010, CSIIRW '10.

[37]  Chris Wysopal,et al.  Static detection of application backdoors , 2010, Datenschutz und Datensicherheit - DuD.

[38]  Herbert Bos,et al.  Instruction-Level Steganography for Covert Trigger-Based Malware - (Extended Abstract) , 2014, DIMVA.

[39]  David Brumley,et al.  Towards Automated Dynamic Analysis for Linux-based Embedded Firmware , 2016, NDSS.

[40]  David Brumley,et al.  AEG: Automatic Exploit Generation , 2011, NDSS.

[41]  Matti Valovirta,et al.  Experimental Security Analysis of a Modern Automobile , 2011 .

[42]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[43]  Davide Balzarotti,et al.  Beyond Precision and Recall: Understanding Uses (and Misuses) of Similarity Hashes in Binary Analysis , 2018, CODASPY.

[44]  Robert C. Holte,et al.  Very Simple Classification Rules Perform Well on Most Commonly Used Datasets , 1993, Machine Learning.

[45]  Dhaval Miyani BinPro: A Tool for Binary Backdoor Accountability in Code Audits , 2016 .

[46]  Andy King,et al.  BinSlayer: accurate comparison of binary executables , 2013, PPREW '13.

[47]  Sergey Bratus,et al.  The Page-Fault Weird Machine: Lessons in Instruction-less Computation , 2013, WOOT.

[48]  Danny Bradbury SCADA: a critical vulnerability , 2012 .

[49]  Xi Chen,et al.  An In-Depth Analysis of Disassembly on Full-Scale x86/x64 Binaries , 2016, USENIX Security Symposium.

[50]  Yang Xiang,et al.  Malware Variant Detection Using Similarity Search over Sets of Control Flow Graphs , 2011, 2011IEEE 10th International Conference on Trust, Security and Privacy in Computing and Communications.

[51]  Ronen Feldman,et al.  The Data Mining and Knowledge Discovery Handbook , 2005 .

[52]  George Candea,et al.  S2E: a platform for in-vivo multi-path analysis of software systems , 2011, ASPLOS XVI.

[53]  Adel Djoudi,et al.  BINSEC: Binary Code Analysis with Low-Level Regions , 2015, TACAS.

[54]  Yin Zhang,et al.  Detecting Backdoors , 2000, USENIX Security Symposium.

[55]  Arun Lakhotia,et al.  Fast location of similar code fragments using semantic 'juice' , 2013, PPREW '13.

[56]  Taddeus Kroes,et al.  JTR: A Binary Solution for Switch-Case Recovery , 2017, ESSoS.

[57]  Xiaojin Zhu,et al.  Introduction to Semi-Supervised Learning , 2009, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[58]  Marcus A. Maloof,et al.  Learning to Detect and Classify Malicious Executables in the Wild , 2006, J. Mach. Learn. Res..

[59]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[60]  Yong Chen,et al.  Hierarchical associative classifier (HAC) for malware detection from the large and imbalanced gray list , 2009, Journal of Intelligent Information Systems.

[61]  Christopher Krügel,et al.  SOK: (State of) The Art of War: Offensive Techniques in Binary Analysis , 2016, 2016 IEEE Symposium on Security and Privacy (SP).

[62]  Herbert Bos,et al.  PIE: Parser Identification in Embedded Systems , 2015, ACSAC.

[63]  Yan Lin,et al.  On the Effectiveness of Code-Reuse-Based Android Application Obfuscation , 2016, ICISC.

[64]  Gregory R. Andrews,et al.  Disassembly of executable code revisited , 2002, Ninth Working Conference on Reverse Engineering, 2002. Proceedings..

[65]  G. Ramalingam,et al.  The undecidability of aliasing , 1994, TOPL.

[66]  Priya Narasimhan,et al.  Binary Function Clustering Using Semantic Hashes , 2012, 2012 11th International Conference on Machine Learning and Applications.

[67]  David Brumley,et al.  Unleashing Mayhem on Binary Code , 2012, 2012 IEEE Symposium on Security and Privacy.

[68]  Tom Chothia,et al.  Stringer: Measuring the Importance of Static Data Comparisons to Detect Backdoors and Undocumented Functionality , 2017, ESORICS.

[69]  Salvatore J. Stolfo,et al.  When Firmware Modifications Attack: A Case Study of Embedded Exploitation , 2013, NDSS.

[70]  Khaled Yakdan,et al.  discovRE: Efficient Cross-Architecture Identification of Bugs in Binary Code , 2016, NDSS.

[71]  Thomas Dullien,et al.  Graph-based comparison of Executable Objects , 2005 .

[72]  Harold W. Kuhn,et al.  The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.

[73]  Wenke Lee,et al.  Jekyll on iOS: When Benign Apps Become Evil , 2013, USENIX Security Symposium.

[74]  Din J. Wasem,et al.  Mining of Massive Datasets , 2014 .

[75]  Ting Wang,et al.  Backdoor attacks against learning systems , 2017, 2017 IEEE Conference on Communications and Network Security (CNS).

[76]  Flemming Nielson,et al.  Principles of Program Analysis , 1999, Springer Berlin Heidelberg.

[77]  Barton P. Miller,et al.  Anywhere, any-time binary instrumentation , 2011, PASTE '11.

[78]  Levente Buttyán,et al.  Towards Semi-automated Detection of Trigger-based Behavior for Software Security Assurance , 2017, ARES.

[79]  Tom Chothia,et al.  HumIDIFy: A Tool for Hidden Functionality Detection in Firmware , 2017, DIMVA.

[80]  Debin Gao,et al.  BinHunt: Automatically Finding Semantic Differences in Binary Programs , 2008, ICICS.

[81]  Herbert Bos,et al.  Compiler-Agnostic Function Detection in Binaries , 2017, 2017 IEEE European Symposium on Security and Privacy (EuroS&P).

[82]  Lynn Margaret Batten,et al.  Function length as a tool for malware classification , 2008, 2008 3rd International Conference on Malicious and Unwanted Software (MALWARE).

[83]  Yuval Elovici,et al.  Detecting unknown malicious code by applying classification techniques on OpCode patterns , 2012, Security Informatics.

[84]  Adam Osborne An introduction to microcomputers. Vol.1: Basic concepts; Vol.2: Some real microprocessors; Vol.3: Some real support devices , 1976 .

[85]  Dennis Sylvester,et al.  A2: Analog Malicious Hardware , 2016, 2016 IEEE Symposium on Security and Privacy (SP).

[86]  Debin Gao,et al.  RopSteg: program steganography with return oriented programming , 2014, CODASPY '14.

[87]  Dawn Xiaodong Song,et al.  Recognizing Functions in Binaries with Neural Networks , 2015, USENIX Security Symposium.

[88]  John G. Cleary,et al.  K*: An Instance-based Learner Using and Entropic Distance Measure , 1995, ICML.

[89]  Christopher Krügel,et al.  Scalable, Behavior-Based Malware Clustering , 2009, NDSS.

[90]  Bernhard Pfahringer,et al.  Locally Weighted Naive Bayes , 2002, UAI.

[91]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[92]  David Brumley,et al.  BAP: A Binary Analysis Platform , 2011, CAV.

[93]  Ian H. Witten,et al.  Generating Accurate Rule Sets Without Global Optimization , 1998, ICML.

[94]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[95]  Luca Bruno,et al.  AVATAR: A Framework to Support Dynamic Security Analysis of Embedded Systems' Firmwares , 2014, NDSS.

[96]  Apostolis Zarras,et al.  Automated Dynamic Firmware Analysis at Scale: A Case Study on Embedded Web Interfaces , 2015, AsiaCCS.

[97]  Zhenkai Liang,et al.  BitBlaze: A New Approach to Computer Security via Binary Analysis , 2008, ICISS.

[98]  Jesse D. Kornblum Identifying almost identical files using context triggered piecewise hashing , 2006, Digit. Investig..

[99]  Minkyu Jung,et al.  Testing intermediate representations for binary analysis , 2017, 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[100]  Leonard M. Freeman,et al.  A set of measures of centrality based upon betweenness , 1977 .

[101]  Stephen Checkoway,et al.  iSeeYou: Disabling the MacBook Webcam Indicator LED , 2014, USENIX Security Symposium.

[102]  D. Kibler,et al.  Instance-based learning algorithms , 2004, Machine Learning.

[103]  George Candea,et al.  Enabling sophisticated analyses of ×86 binaries with RevGen , 2011, 2011 IEEE/IFIP 41st International Conference on Dependable Systems and Networks Workshops (DSN-W).

[104]  Thomas W. Reps,et al.  WYSINWYX: What you see is not what you eXecute , 2005, TOPL.

[105]  Christopher Krügel,et al.  Driller: Augmenting Fuzzing Through Selective Symbolic Execution , 2016, NDSS.

[106]  Aurélien Francillon,et al.  Avatar2: A Multi-Target Orchestration Platform , 2018 .

[107]  Salvatore J. Stolfo,et al.  Data mining methods for detection of new malicious executables , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[108]  Sergey Bratus,et al.  Exploit Programming: From Buffer Overflows to "Weird Machines" and Theory of Computation , 2011, login Usenix Mag..

[109]  Aurélien Francillon,et al.  A Large-Scale Analysis of the Security of Embedded Firmwares , 2014, USENIX Security Symposium.

[110]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[111]  Paul A. S. Ward,et al.  Combining static analysis and targeted symbolic execution for scalable bug-finding in application binaries , 2016, CASCON.

[112]  Sergey Bratus,et al.  "Weird Machines" in ELF: A Spotlight on the Underappreciated Metadata , 2013, WOOT.

[114]  Ron Kohavi,et al.  The Power of Decision Tables , 1995, ECML.

[115]  Md. Rafiqul Islam,et al.  An automated classification system based on the strings of trojan and virus families , 2009, 2009 4th International Conference on Malicious and Unwanted Software (MALWARE).

[116]  Bodo Möller,et al.  This POODLE Bites: Exploiting The SSL 3.0 Fallback , 2014 .

[117]  Lynn Batten,et al.  Classification of Malware Based on String and Function Feature Selection , 2010, 2010 Second Cybercrime and Trustworthy Computing Workshop.

[118]  Kang G. Shin,et al.  Large-scale malware indexing using function-call graphs , 2009, CCS.

[119]  Thomas Dullien,et al.  REIL: A platform-independent intermediate representation of disassembled code for static code analysis , 2009 .

[120]  Debin Gao,et al.  iBinHunt: Binary Hunting with Inter-procedural Control Flow , 2012, ICISC.