Quality Assurance for Security Applications of Big Data

The quality of inferences drawn from data, big or small, is heavily dependent on the quality of the data and the quality of the processes applied to it. Big data analytics is emerging from laboratories and being applied to intelligence and security needs. To achieve confidence in the outcomes of these applications, a quality assurance framework is needed. This paper outlines the challenges, and draws attention to the consequences of misconceived and misapplied projects. It presents key aspects of the necessary risk assessment and risk management approaches, and suggests opportunities for research.

[1]  Heiko Mueller,et al.  Problems , Methods , and Challenges in Comprehensive Data Cleansing , 2005 .

[2]  Roger A. Clarke,et al.  A contingency approach to the application software generations , 1991, DATB.

[3]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[4]  Beth Plale,et al.  Fast Data Management with Distributed Streaming SQL , 2015, ArXiv.

[5]  Erhard Rahm,et al.  Data Cleaning: Problems and Current Approaches , 2000, IEEE Data Eng. Bull..

[6]  Roger Clarke,et al.  Big Data's Big Unintended Consequences , 2013, Computer.

[7]  Roger Clarke,et al.  Big data, big risks , 2016, Inf. Syst. J..

[8]  Michael Batty,et al.  The Cult of Information , 1988 .

[9]  Diane M. Strong,et al.  Beyond Accuracy: What Data Quality Means to Data Consumers , 1996, J. Manag. Inf. Syst..

[10]  Roger Clarke,et al.  Privacy impact assessment: Its origins and development , 2009, Comput. Law Secur. Rev..

[11]  Roger Clarke,et al.  The prospects of easier security for small organisations and consumers , 2015, Comput. Law Secur. Rev..

[12]  David Wright,et al.  Constructing a surveillance impact assessment , 2012, Comput. Law Secur. Rev..

[13]  Magnus Jändel,et al.  Decision support for releasing anonymised data , 2014, Comput. Secur..

[14]  Roger Clarke,et al.  A normative regulatory framework for computer matching , 1995 .

[15]  Darrel C. Ince,et al.  The case for open computer programs , 2012, Nature.

[16]  Roger Clarke,et al.  Quasi-Empirical Scenario Analysis and Its Application to Big Data Quality , 2015, Bled eConference.

[17]  Maximilian Röglinger,et al.  Big Data , 2013, Bus. Inf. Syst. Eng..

[18]  B. Fitzgerald Guidance Regarding Methods for De-identification of Protected Health Information in Accordance with the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule , 2015 .

[19]  Graeme G. Shanks,et al.  Understanding Data Quality in a Data Warehouse , 1998, Aust. Comput. J..

[20]  Sung-Hyuk Park,et al.  A Social Network-Based Inference Model for Validating Customer Profile Data , 2012, MIS Q..

[21]  V. Colapietro The Nature of Rationality , 1995 .

[22]  Baba Piprani,et al.  A Model for Data Quality Assessment , 2008, OTM Workshops.

[23]  Dorothy E. Denning,et al.  Secure statistical databases with random sample queries , 1980, TODS.

[24]  Alessandro Acquisti,et al.  Predicting Social Security numbers from public data , 2009, Proceedings of the National Academy of Sciences.

[25]  John McCarthy,et al.  What Computers Still Can't Do , 1996, Artif. Intell..

[26]  Melanie Swan,et al.  The Quantified Self: Fundamental Disruption in Big Data Science and Biological Discovery , 2013, Big Data.

[27]  Robert Lanfear,et al.  Public Data Archiving in Ecology and Evolution: How Well Are We Doing? , 2015, PLoS biology.

[28]  David H. Guston,et al.  Real-time technology assessment , 2020, Emerging Technologies: Ethics, Law and Governance.

[29]  Roger Clarke,et al.  Surfing the third wave of computing: A framework for research into eObjects , 2015, Comput. Law Secur. Rev..

[30]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2002, Journal of Cryptology.

[31]  Jennifer Widom,et al.  The Beckman Report on Database Research , 2014, SGMD.

[32]  Jennifer Widom,et al.  Challenges and Opportunities with Big Data 2011-1 , 2011 .

[33]  Jennifer Widom,et al.  Research problems in data warehousing , 1995, CIKM '95.

[34]  Paul Ohm Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization , 2009 .

[35]  Surajit Chaudhuri,et al.  An overview of data warehousing and OLAP technology , 1997, SGMD.

[36]  Gary T. Marx,et al.  Routinizing the Discovery of Secrets , 1984 .

[37]  D. Boyd,et al.  CRITICAL QUESTIONS FOR BIG DATA , 2012 .

[38]  Roger Clarke,et al.  What drones inherit from their ancestors , 2014, Comput. Law Secur. Rev..

[39]  Jignesh M. Patel,et al.  Big data and its technical challenges , 2014, CACM.

[40]  L. Sweeney Replacing personally-identifying information in medical records, the Scrub system. , 1996, Proceedings : a conference of the American Medical Informatics Association. AMIA Fall Symposium.