Big Data Forensics: Hadoop Distributed File Systems as a Case Study

Big Data has fast become one of the most adopted computer paradigms within computer science and is considered an equally challenging paradigm for forensics investigators. The Hadoop Distributed File System (HDFS) is one of the most favourable big data platforms within the market, providing an unparalleled service with regards to parallel processing and data analytics. However, HDFS is not without its risks, having been reportedly targeted by cyber criminals as a means of stealing and exfiltrating confidential data. Using HDFS as a case study, we aim to detect remnants of malicious users’ activities within the HDFS environment. Our examination involves a thorough analysis of different areas of the HDFS environment, including a range of log files and disk images. Our experimental environment was comprised of a total of four virtual machines, all running Ubuntu. This HDFS research provides a thorough understanding of the types of forensically relevant artefacts that are likely to be found during a forensic investigation.

[1]  Ali Dehghantanha,et al.  Ensemble-based multi-filter feature selection method for DDoS detection in cloud computing , 2016, EURASIP Journal on Wireless Communications and Networking.

[2]  Shui Yu,et al.  Big Data Concepts, Theories, and Applications , 2016, Springer International Publishing.

[3]  Johnny S. Wong,et al.  A Brief Review on Leading Big Data Models , 2014, Data Sci. J..

[4]  Ali Dehghantanha,et al.  Emerging from The Cloud: A Bibliometric Analysis of Cloud Forensics Studies , 2018, ArXiv.

[5]  Xindong Wu,et al.  Data mining with big data , 2014, IEEE Transactions on Knowledge and Data Engineering.

[6]  Ali Dehghantanha,et al.  Know Abnormal, Find Evil: Frequent Pattern Mining for Ransomware Threat Hunting and Intelligence , 2018, IEEE Transactions on Emerging Topics in Computing.

[7]  Kim-Kwang Raymond Choo,et al.  Cloud Storage Forensics , 2013, Contemporary Digital Forensic Investigations of Cloud and Mobile Applications.

[8]  Jan H. P. Eloff,et al.  Framework for a Digital Forensic Investigation , 2006, ISSA.

[9]  Marleen Huysman,et al.  Debating big data: A literature review on realizing value from big data , 2017, J. Strateg. Inf. Syst..

[10]  Ali Dehghantanha,et al.  Intelligent OS X malware threat detection with code inspection , 2018, Journal of Computer Virology and Hacking Techniques.

[11]  Ali Dehghantanha,et al.  Network Traffic Forensics on Firefox Mobile OS: Facebook, Twitter and Telegram as Case Studies , 2017, Contemporary Digital Forensic Investigations of Cloud and Mobile Applications.

[12]  Ali Dehghantanha,et al.  Detecting crypto-ransomware in IoT networks based on energy consumption footprint , 2018, J. Ambient Intell. Humaniz. Comput..

[13]  Ali Dehghantanha,et al.  CloudMe Forensics: A Case of Big-Data Investigation , 2018, ArXiv.

[14]  Alessandro Guarino,et al.  Digital Forensics as a Big Data Challenge , 2013, ISSE.

[15]  Ali Dehghantanha,et al.  Forensic investigation of OneDrive, Box, GoogleDrive and Dropbox applications on Android and iOS devices , 2016 .

[16]  Ali Dehghantanha,et al.  Forensic Investigation of P2P Cloud Storage: BitTorrent Sync as a Case Study , 2017, ArXiv.

[17]  Kim-Kwang Raymond Choo,et al.  HEPart: A balanced hypergraph partitioning algorithm for big data applications , 2018, Future Gener. Comput. Syst..

[18]  Kim-Kwang Raymond Choo,et al.  An integrated conceptual digital forensic framework for cloud computing , 2012, Digit. Investig..

[19]  Ali Dehghantanha,et al.  Leveraging Support Vector Machine for Opcode Density Based Detection of Crypto-Ransomware , 2018, ArXiv.

[20]  Ali Dehghantanha,et al.  SugarSync forensic analysis , 2016 .

[21]  Andrew Blyth,et al.  A forensic cloud environment to address the big data challenge in digital forensics , 2016, 2016 SAI Computing Conference (SAI).

[22]  Kim-Kwang Raymond Choo,et al.  Distributed filesystem forensics: XtreemFS as a case study , 2014, Digit. Investig..

[23]  S. Almulla,et al.  Cloud forensics: A research perspective , 2013, 2013 9th International Conference on Innovations in Information Technology (IIT).

[24]  William J. Buchanan,et al.  A RAM triage methodology for Hadoop HDFS forensics , 2016, Digit. Investig..

[25]  Ali Dehghantanha,et al.  Residual Cloud Forensics: CloudMe and 360Yunpan as Case Studies , 2017, Contemporary Digital Forensic Investigations of Cloud and Mobile Applications.

[26]  Ali Dehghantanha,et al.  A Cyber Kill Chain Based Taxonomy of Banking Trojans for Evolutionary Computational Intelligence , 2017, J. Comput. Sci..

[27]  Ali Dehghantanha,et al.  CloudMe forensics: A case of big data forensic investigation , 2017, Concurr. Comput. Pract. Exp..

[28]  Ali Dehghantanha,et al.  Internet of Things security and forensics: Challenges and opportunities , 2018, Future Gener. Comput. Syst..

[29]  Ghazi Al-Naymat,et al.  A New Technique for File Carving on Hadoop Ecosystem , 2017, 2017 International Conference on New Trends in Computing Sciences (ICTCS).

[30]  Ali Dehghantanha,et al.  Investigating Social Networking applications on smartphones detecting Facebook, Twitter, LinkedIn and Google+ artefacts on Android and iOS platforms , 2016 .

[31]  Tomasz Wiktorski,et al.  SD-HDFS: Secure Deletion in Hadoop Distributed File System , 2016, 2016 IEEE International Congress on Big Data (BigData Congress).

[32]  M. Edington Alex,et al.  Forensics framework for cloud computing , 2017, Comput. Electr. Eng..

[33]  Mohsen Guizani,et al.  Haddle: A Framework for Investigating Data Leakage Attacks in Hadoop , 2014, GLOBECOM 2014.

[34]  Shams Zawoad,et al.  Digital Forensics in the Age of Big Data: Challenges, Approaches, and Opportunities , 2015, 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems.

[35]  Mohsen Guizani,et al.  Security Threats to Hadoop: Data Leakage Attacks and Investigation , 2017, IEEE Network.

[36]  Bin Fang,et al.  Big Data in Finance , 2016 .

[37]  Ali Dehghantanha,et al.  Digital forensics: the missing piece of the Internet of Things promise , 2016 .

[38]  Binglong Li,et al.  A forensic method for efficient file extraction in HDFS based on three-level mapping , 2017, Wuhan University Journal of Natural Sciences.

[39]  Sriram Rao,et al.  Traceback: A Forensic Tool for Distributed Systems , 2016 .

[40]  Natawut Nupairoj,et al.  Improving performance of small-file accessing in Hadoop , 2014, 2014 11th International Joint Conference on Computer Science and Software Engineering (JCSSE).

[41]  Waseem Iqbal,et al.  Big Data — An evolving concern for forensic investigators , 2015, 2015 First International Conference on Anti-Cybercrime (ICACC).

[42]  Ali Dehghantanha,et al.  Greening Cloud-Enabled Big Data Storage Forensics: Syncany as a Case Study , 2018, IEEE Transactions on Sustainable Computing.

[43]  Ali Dehghantanha,et al.  Leveraging Machine Learning Techniques for Windows Ransomware Network Traffic Detection , 2018, ArXiv.

[44]  Ali Dehghantanha,et al.  Forensic Investigation of Cooperative Storage Cloud Service: Symform as a Case Study , 2017, Journal of forensic sciences.

[45]  Gregory Epiphaniou,et al.  Adaptive Traffic Fingerprinting for Darknet Threat Intelligence , 2018, ArXiv.