Machine Learning Against Terrorism: How Big Data Collection and Analysis Influences the Privacy-Security Dilemma

Rapid advancements in machine learning techniques allow mass surveillance to be applied on larger scales and utilize more and more personal data. These developments demand reconsideration of the privacy-security dilemma, which describes the tradeoffs between national security interests and individual privacy concerns. By investigating mass surveillance techniques that use bulk data collection and machine learning algorithms, we show why these methods are unlikely to pinpoint terrorists in order to prevent attacks. The diverse characteristics of terrorist attacks—especially when considering lone-wolf terrorism—lead to irregular and isolated (digital) footprints. The irregularity of data affects the accuracy of machine learning algorithms and the mass surveillance that depends on them which can be explained by three kinds of known problems encountered in machine learning theory: class imbalance, the curse of dimensionality, and spurious correlations. Proponents of mass surveillance often invoke the distinction between collecting data and metadata, in which the latter is understood as a lesser breach of privacy. Their arguments commonly overlook the ambiguity in the definitions of data and metadata and ignore the ability of machine learning techniques to infer the former from the latter. Given the sparsity of datasets used for machine learning in counterterrorism and the privacy risks attendant with bulk data collection, policymakers and other relevant stakeholders should critically re-evaluate the likelihood of success of the algorithms and the collection of data on which they depend.

[1]  Federico Liberatore,et al.  A Decision Support System for predictive police patrolling , 2015, Decis. Support Syst..

[2]  Joan Feigenbaum,et al.  On the Feasibility of a Technological Response to the Surveillance Morass , 2014, Security Protocols Workshop.

[3]  Cristian S. Calude,et al.  The Deluge of Spurious Correlations in Big Data , 2016, Foundations of Science.

[4]  Vitaly Shmatikov,et al.  Robust De-anonymization of Large Sparse Datasets , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[5]  Aleksey Boyko,et al.  Detecting Cancer Metastases on Gigapixel Pathology Images , 2017, ArXiv.

[6]  David B. Dunson,et al.  Statistics in the big data era: Failures of the machine , 2018 .

[7]  Paula Kift,et al.  Metadata in Context - An Ontological and Normative Analysis of the NSA's Bulk Telephony Metadata Collection Program , 2016 .

[8]  Christopher Soghoian Insecure Flight: Broken Boarding Passes and Ineffective Terrorist Watch Lists , 2007 .

[9]  Irina Matijosaitiene,et al.  Predicting Safe Parking Spaces: A Machine Learning Approach to Geospatial Urban and Crime Data , 2019, Sustainability.

[10]  Gert-Jan C. Lokhorst,et al.  Engineering and the Problem of Moral Overload , 2011, Sci. Eng. Ethics.

[11]  B. Schuurman,et al.  Radicalization patterns and modes of attack planning and preparation among lone-actor terrorists: an exploratory analysis , 2019 .

[12]  Y. de Montjoye,et al.  Unique in the shopping mall: On the reidentifiability of credit card metadata , 2015, Science.

[13]  Matenia P. Sirseloudi How to predict the unpredictable: On the early detection of terrorist campaigns , 2005 .

[14]  Jim Harper,et al.  Effective Counterterrorism and the Limited Role of Predictive Data Mining , 2006 .

[15]  César A. Hidalgo,et al.  Unique in the Crowd: The privacy bounds of human mobility , 2013, Scientific Reports.

[16]  Miriam A. M. Capretz,et al.  Machine Learning With Big Data: Challenges and Approaches , 2017, IEEE Access.

[17]  A. Shapiro,et al.  National Consortium for the Study of Terrorism and Responses to Terrorism , 2010 .

[18]  Sarah Brayne Big Data Surveillance: The Case of Policing , 2017, American sociological review.

[19]  John C. Mitchell,et al.  Evaluating the privacy properties of telephone metadata , 2016, Proceedings of the National Academy of Sciences.

[20]  D. Bigo,et al.  Mass Surveillance of Personal Data by EU Member States and its Compatibility with EU Law. CEPS Liberty and Security in Europe No. 61, 6 November 2013 , 2013 .

[21]  Tanmoy Bhattacharya,et al.  The need for uncertainty quantification in machine-assisted medical decision making , 2019, Nat. Mach. Intell..

[22]  Susan Landau,et al.  Making Sense from Snowden: What's Significant in the NSA Surveillance Revelations , 2013, IEEE Security & Privacy.

[23]  Broderick Crawford,et al.  Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) , 2007 .

[24]  Stephanie J. Bird Security and Privacy: Why Privacy Matters , 2013, Sci. Eng. Ethics.