The Not Yet Exploited Goldmine of OSINT: Opportunities, Open Challenges and Future Trends

The amount of data generated by the current interconnected world is immeasurable, and a large part of such data is publicly available, which means that it is accessible by any user, at any time, from anywhere in the Internet. In this respect, Open Source Intelligence (OSINT) is a type of intelligence that actually benefits from that open nature by collecting, processing and correlating points of the whole cyberspace to generate knowledge. In fact, recent advances in technology are causing OSINT to currently evolve at a dizzying rate, providing innovative data-driven and AI-powered applications for politics, economy or society, but also offering new lines of action against cyberthreats and cybercrime. The paper at hand describes the current state of OSINT and makes a comprehensive review of the paradigm, focusing on the services and techniques enhancing the cybersecurity field. On the one hand, we analyze the strong points of this methodology and propose numerous ways to apply it to cybersecurity. On the other hand, we cover the limitations when adopting it. Considering there is a lot left to explore in this ample field, we also enumerate some open challenges to be addressed in the future. Additionally, we study the role of OSINT in the public sphere of governments, which constitute an ideal landscape to exploit open data.

[1]  Katrin Franke,et al.  Extracting cyber threat intelligence from hacker forums: Support vector machines versus convolutional neural networks , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[2]  Jongbum Chae,et al.  A System Approach for Evaluating Current and Emerging Army Open-Source Intelligence Tools , 2019, 2019 IEEE International Systems Conference (SysCon).

[3]  Luc Rocher,et al.  Estimating the success of re-identifications in incomplete datasets using generative models , 2019, Nature Communications.

[4]  Ilana Blum,et al.  Defining Second Generation Open Source Intelligence (Osint) for the Defense Enterprise , 2018 .

[5]  Stavros Shiaeles,et al.  Localising social network users and profiling their movement , 2019, Comput. Secur..

[6]  Matthew Moran,et al.  Fusing algorithms and analysts: open-source intelligence in the age of ‘Big Data’ , 2018 .

[7]  Robert Layton,et al.  Relative Cyberattack Attribution , 2016 .

[8]  Arthur S. Hulnick,et al.  The Dilemma of Open Sources intelligence: Is OSINT Really Intelligence? , 2010 .

[9]  Maurice Dawson,et al.  Open Source Intelligence: Performing Data Mining and Link Analysis to Track Terrorist Activities , 2018 .

[10]  Xun Wang,et al.  Review on mining data from multiple data sources , 2018, Pattern Recognit. Lett..

[11]  Hyeisun Cho,et al.  Design of a Cyber Threat Information Collection System for Cyber Attack Correlation , 2018, 2018 International Conference on Platform Technology and Service (PlatCon).

[12]  Lilian Mitrou,et al.  Which side are you on? A new Panopticon vs. privacy , 2013, 2013 International Conference on Security and Cryptography (SECRYPT).

[13]  Dimitris Gritzalis,et al.  Stress level detection via OSN usage pattern and chronicity analysis: An OSINT threat intelligence module , 2017, Comput. Secur..

[14]  B. L. William Wong Fluidity and Rigour: Addressing the Design Considerations for OSINT Tools and Processes , 2016 .

[15]  Daniel Trottier,et al.  Open source intelligence, social media and law enforcement: Visions, constraints and critiques , 2015 .

[16]  Avner Barnea,et al.  Big Data and Counterintelligence in Western Countries , 2019, International Journal of Intelligence and CounterIntelligence.

[17]  Murtaza Haider,et al.  Beyond the hype: Big data concepts, methods, and analytics , 2015, Int. J. Inf. Manag..

[18]  Georgios Kambourakis,et al.  Optimal Countermeasures Selection Against Cyber Attacks: A Comprehensive Survey on Reaction Frameworks , 2018, IEEE Communications Surveys & Tutorials.

[19]  Ruth Breu,et al.  An analysis and classification of public information security data sources used in research and practice , 2019, Comput. Secur..

[20]  Alysson Bessani,et al.  PURE: Generating Quality Threat Intelligence by Clustering and Correlating OSINT , 2019, 2019 18th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/13th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE).

[21]  Robin M. Kowalski,et al.  Cyberbullying Matters: Examining the Incremental Impact of Cyberbullying On Outcomes Over and Above Traditional Bullying in North America , 2016 .

[22]  Pasquale De Meo,et al.  Web Data Extraction , Applications and Techniques : A Survey , 2010 .

[23]  Philip S. Yu,et al.  Truth Discovery with Multiple Conflicting Information Providers on the Web , 2007, IEEE Transactions on Knowledge and Data Engineering.

[24]  Stefan Stieglitz,et al.  Social media analytics - Challenges in topic discovery, data collection, and data preparation , 2018, Int. J. Inf. Manag..

[25]  L. Betts,et al.  Developing the Cyber Victimization Experiences and Cyberbullying Behaviors Scales , 2017, The Journal of genetic psychology.

[26]  C. Fleisher Using open source data in developing competitive and marketing intelligence , 2008 .

[27]  Leslie D. Ball,et al.  Undermining - Social Engineering using Open Source Intelligence Gathering , 2012, KDIR.

[28]  Libor Benes OSINT, New Technologies, Education: Expanding Opportunities and Threats. A New Paradigm , 2013 .

[29]  Da-Yu Kao,et al.  Digital Evidence Analytics Applied in Cybercrime Investigations , 2018, 2018 IEEE Conference on Application, Information and Network Security (AINS).

[30]  Pompeu Casanovas,et al.  Cyber Warfare and Organised Crime. A Regulatory Model and Meta-Model for Open Source Intelligence (OSINT) , 2017 .

[31]  Michael K. Bergman White Paper: The Deep Web: Surfacing Hidden Value , 2001 .

[32]  Paul A. Watters,et al.  Indirect Information Linkage for OSINT through Authorship Analysis of Aliases , 2013, PAKDD Workshops.

[33]  George R. S. Weir,et al.  The Limitations of Automating OSINT: Understanding the Question, Not the Answer , 2015 .

[34]  Vern Paxson,et al.  Tools for Automated Analysis of Cybercriminal Markets , 2017, WWW.

[35]  Rosario Del Rey,et al.  How Much Do Adolescents Cybergossip? Scale Development and Validation in Spain and Colombia , 2018, Front. Psychol..

[36]  Chin-Laung Lei,et al.  Infection categorization using deep autoencoder , 2018, IEEE INFOCOM 2018 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS).

[37]  Babak Akhgar OSINT as an Integral Part of the National Security Apparatus , 2016 .

[38]  Bruno Grilhères,et al.  Events Extraction and Aggregation for Open Source Intelligence: From Text to Knowledge , 2013, 2013 IEEE 25th International Conference on Tools with Artificial Intelligence.

[39]  Audun Jøsang,et al.  Semantic Cyberthreat Modelling , 2016, STIDS.

[40]  Changhoon Lee,et al.  A Reliability Comparison Method for OSINT Validity Analysis , 2018, IEEE Transactions on Industrial Informatics.

[41]  Yiannis Kompatsiaris,et al.  Deploying Semantic Web Technologies for Information Fusion of Terrorism-related Content and Threat Detection on the Web , 2019, WI.

[42]  Gregorio Martínez Pérez,et al.  OSINT is the next Internet goldmine: Spain as an unexplored territory , 2019 .

[43]  Kim-Kwang Raymond Choo,et al.  Digital forensic intelligence: Data subsets and Open Source Intelligence (DFINT+OSINT): A timely and cohesive mix , 2018, Future Gener. Comput. Syst..

[44]  Georgios Kambourakis,et al.  Screening Out Social Bots Interference: Are There Any Silver Bullets? , 2019, IEEE Communications Magazine.

[45]  V. Thouvenot,et al.  Extracting Future Crime Indicators from Social Media , 2017 .

[46]  Félix Gómez Mármol,et al.  I Don't Trust ICT: Research Challenges in Cyber Security , 2016, IFIPTM.

[47]  Julian Jang,et al.  A survey of emerging threats in cybersecurity , 2014, J. Comput. Syst. Sci..

[48]  Arvind Narayanan,et al.  Semantics derived automatically from language corpora contain human-like biases , 2016, Science.

[49]  Babak Akhgar,et al.  An Investigation of Using Classification Techniques in Prediction of Type of Targets in Cyber Attacks , 2019, 2019 IEEE 12th International Conference on Global Security, Safety and Sustainability (ICGS3).

[50]  Vladimir Vlassov,et al.  Veracity assessment of online data , 2020, Decis. Support Syst..

[51]  Awais Rashid,et al.  Panning for gold: Automatically analysing online social engineering attack surfaces , 2017, Comput. Secur..

[52]  Yao-Yi Chiang,et al.  Emerging trends in geospatial artificial intelligence (geoAI): potential applications for environmental epidemiology , 2018, Environmental Health.

[53]  Hamilton Bean,et al.  Is open source intelligence an ethical issue , 2011 .

[54]  Taeshik Shon,et al.  Open source intelligence base cyber threat inspection framework for critical infrastructures , 2016, 2016 Future Technologies Conference (FTC).

[55]  Vasileios Mavroeidis,et al.  Cyber Threat Intelligence Model: An Evaluation of Taxonomies, Sharing Standards, and Ontologies within Cyber Threat Intelligence , 2017, 2017 European Intelligence and Security Informatics Conference (EISIC).

[56]  Félix Gómez Mármol,et al.  Reporting Offensive Content in Social Networks: Toward a Reputation-Based Assessment Approach , 2014, IEEE Internet Computing.

[57]  Fahimeh Tabatabaei,et al.  OSINT in the Context of Cyber-Security , 2016 .

[58]  Ulrich Schade,et al.  NLP as an essential ingredient of effective OSINT frameworks , 2013, 2013 Military Communications and Information Systems Conference.

[59]  Michael Goldsmith,et al.  Cybercrime Investigators are Users Too! Understanding the Socio-Technical Challenges Faced by Law Enforcement , 2019, Proceedings 2019 Workshop on Usable Security.

[60]  Bowman H. Miller Open Source Intelligence (OSINT): An Oxymoron? , 2018, International Journal of Intelligence and CounterIntelligence.

[61]  Suhang Wang,et al.  Fake News Detection on Social Media: A Data Mining Perspective , 2017, SKDD.

[62]  Mohamed Ali Kâafar,et al.  Digging into Anonymous Traffic: A Deep Analysis of the Tor Anonymizing Network , 2010, 2010 Fourth International Conference on Network and System Security.

[63]  Martin Jose Hernandez Mediná,et al.  Open source intelligence (OSINT) in a colombian context and sentiment analysis , 2018 .

[64]  Helen Gibson,et al.  Fusion of OSINT and non-OSINT data , 2016 .

[65]  Ioannis Agrafiotis,et al.  The challenge of detecting sophisticated attacks: Insights from SOC Analysts , 2018, ARES.

[66]  Dmytro V. Lande,et al.  OSINT as a part of cyber defense system , 2019 .

[67]  Jason J. Jung,et al.  Social big data: Recent achievements and new challenges , 2015, Information Fusion.

[68]  Anupam Joshi,et al.  Using Deep Neural Networks to Translate Multi-lingual Threat Intelligence , 2018, 2018 IEEE International Conference on Intelligence and Security Informatics (ISI).

[69]  David Arroyo,et al.  A compression based framework for the detection of anomalies in heterogeneous data sources , 2019, ArXiv.

[70]  Henrik Legind Larsen,et al.  Scanning of Open Data for Detection of Emerging Organized Crime Threats—The ePOOLICE Project , 2017 .

[71]  Zahid Anwar,et al.  A Supervised Machine Learning Based Approach for Automatically Extracting High-Level Threat Intelligence from Unstructured Sources , 2018, 2018 International Conference on Frontiers of Information Technology (FIT).

[72]  Yongdong Zhang,et al.  News Verification by Exploiting Conflicting Social Viewpoints in Microblogs , 2016, AAAI.

[73]  Wiem Tounsi,et al.  A survey on technical threat intelligence in the age of sophisticated cyber attacks , 2018, Comput. Secur..

[74]  Fabrizio Valenti,et al.  Social Opinion Mining: An Approach for Italian Language , 2015, 2015 3rd International Conference on Future Internet of Things and Cloud.

[75]  Bo Zhao,et al.  A Survey on Truth Discovery , 2015, SKDD.

[76]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[77]  Paula Buttery,et al.  Characterizing Eve: Analysing Cybercrime Actors in a Large Underground Forum , 2018, RAID.

[78]  Jaap-Henk Hoepman,et al.  PDF hosted at the Radboud Repository of the Radboud University Nijmegen , 2022 .

[79]  Saba Bashir,et al.  Counter Terrorism on Online Social Networks Using Web Mining Techniques , 2019, Communications in Computer and Information Science.

[80]  Jyri Rajamäki,et al.  How to apply privacy by design in OSINT and big data analytics , 2019 .

[81]  Jussi Simola Privacy issues and critical infrastructure protection , 2020 .

[82]  Veda C. Storey,et al.  Business Intelligence and Analytics: From Big Data to Big Impact , 2012, MIS Q..

[83]  Ashleigh Powell,et al.  Social Media Data in Digital Forensics Investigations , 2020, Digital Forensic Education.

[84]  Quirine Eijkman,et al.  Open source intelligence and privacy dilemmas: Is it time to reassess state accountability? , 2013 .

[85]  Ronald R. Yager,et al.  Using Open Data to Detect Organized Crime Threats: Factors Driving Future Crime , 2017 .

[86]  Gianluca Stringhini,et al.  Automatically Dismantling Online Dating Fraud , 2019, IEEE Transactions on Information Forensics and Security.

[87]  Vincent Lenders,et al.  BlackWidow: Monitoring the Dark Web for Cyber Security Information , 2019, 2019 11th International Conference on Cyber Conflict (CyCon).

[88]  Burgert A. Senekal,et al.  Open source intelligence (OSINT) for conflict monitoring in contemporary South Africa: Challenges and opportunities in a big data context , 2019, African Security Review.