How Sensitivity Classification Effectiveness Impacts Reviewers in Technology-Assisted Sensitivity Review

All government documents that are released to the public must first be manually reviewed to identify and protect any sensitive information, e.g. confidential information. However, the unassisted manual sensitivity review of born-digital documents is not practical due to, for example, the volume of documents that are created. Previous work has shown that sensitivity classification can be effective for predicting if a document contains sensitive information. However, since all of the released documents must be manually reviewed, it is important to know if sensitivity classification can assist sensitivity reviewers in making their sensitivity judgements. Hence, in this paper, we conduct a digital sensitivity review user study, to investigate if the accuracy of sensitivity classification effects the number of documents that a reviewer correctly judges to be sensitive or not (reviewer accuracy) and the time that it takes to sensitivity review a document (reviewing speed). Our results show that providing reviewers with sensitivity classification predictions, from a classifier that achieves 0.7 Balanced Accuracy, results in a 38% increase in mean reviewer accuracy and an increase of 72% in mean reviewing speeds, compared to when reviewers are not provided with predictions. Overall, our findings demonstrate that sensitivity classification is a viable technology for assisting with the sensitivity review of born-digital government documents.

[1]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[2]  Giacomo Berardi,et al.  A utility-theoretic ranking method for semi-automated text classification , 2012, SIGIR '12.

[3]  Craig MacDonald,et al.  Towards a Classifier for Digital Sensitivity Review , 2014, ECIR.

[4]  M. Masson,et al.  Using confidence intervals in within-subject designs , 1994, Psychonomic bulletin & review.

[5]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[6]  Michael Moss,et al.  Our Digital Legacy: an Archival Perspective , 2017 .

[7]  Philip S. Yu,et al.  Privacy-preserving data publishing: A survey of recent developments , 2010, CSUR.

[8]  Geoff Cumming,et al.  Confidence intervals and replication: where will the next mean fall? , 2006, Psychological methods.

[9]  Craig MacDonald,et al.  Enhancing Sensitivity Classification with Semantic Features Using Word Embeddings , 2017, ECIR.

[10]  Maura R. Grossman,et al.  Evaluation of machine-learning protocols for technology-assisted review in electronic discovery , 2014, SIGIR.

[11]  Alistair Tough The Scope and Appetite for Technology-Assisted Sensitivity Reviewing of Born-Digital Records in a Resource Poor Environment: A Case Study From Malawi , 2018 .

[12]  Joachim M. Buhmann,et al.  The Balanced Accuracy and Its Posterior Distribution , 2010, 2010 20th International Conference on Pattern Recognition.

[13]  Richard M. Schmidt,et al.  The Freedom of Information Act , 1987 .

[14]  R. Scott,et al.  Computer terminal work and the benefit of microbreaks. , 2001, Applied ergonomics.

[15]  Michelle L. Kaarst-Brown,et al.  Sensitive information: A review and research agenda , 2005, J. Assoc. Inf. Sci. Technol..

[16]  J. Shane Culpepper,et al.  The Influence of Topic Difficulty, Relevance Level, and Document Ordering on Relevance Judging , 2016, ADCS.

[17]  Iadh Ounis,et al.  A Study of SVM Kernel Functions for Sensitivity Classification Ensembles with POS Sequences , 2017, SIGIR.

[18]  Richard D. Morey,et al.  Confidence Intervals from Normalized Data: A correction to Cousineau (2005) , 2008 .

[19]  Craig MacDonald,et al.  Semi-Automated Text Classification for Sensitivity Identification , 2015, CIKM.

[20]  Denis Cousineau,et al.  Confidence intervals in within-subject designs: A simpler solution to Loftus and Masson's method , 2005 .

[21]  Maura R. Grossman,et al.  Technology-Assisted Review in E-Discovery Can Be More Effective and More Efficient Than Exhaustive Manual Review , 2011 .