Access Control Policy Extraction from Unconstrained Natural Language Text

While access control mechanisms have existed in computer systems since the 1960s, modern system developers often fail to ensure appropriate mechanisms are implemented within particular systems. Such failures allow for individuals, both benign and malicious, to view and manipulate information that they should not otherwise be able to access. The goal of our research is to help developers improve security by extracting the access control policies implicitly and explicitly defined in natural language project artifacts. Developers can then verify and implement the extracted access control policies within a system. We propose a machine-learning based process to parse existing, unaltered natural language documents, such as requirement or technical specifications to extract the relevant subjects, actions, and resources for an access control policy. To evaluate our approach, we analyzed a public requirements specification. We had a precision of 0.87 with a recall of 0.91 in classifying sentences as access control or not. Through a bootstrapping process utilizing dependency graphs, we correctly identified the subjects, actions, and objects elements of the access control policies with a precision of 0.46 and a recall of 0.54.

[1]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[2]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[3]  Rani Hoitash,et al.  Material Weakness Remediation and Earnings Quality: A Detailed Examination by Type of Control Deficiency , 2011 .

[4]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[5]  Annie I. Antón,et al.  Requirements-based Access Control Analysis and Policy Specification (ReCAPS) , 2009, Inf. Softw. Technol..

[6]  Tao Xie,et al.  Automated extraction of security policies from natural-language software documents , 2012, SIGSOFT FSE.

[7]  A. Akbik,et al.  Wanderlust : Extracting Semantic Relations from Natural Language Text Using Dependency Grammar Patterns , 2009 .

[8]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[9]  Ralf Zimmer,et al.  RelEx - Relation extraction using dependency parse trees , 2007, Bioinform..

[10]  Laurie A. Williams,et al.  Classifying Natural Language Sentences for Policy , 2012, 2012 IEEE International Symposium on Policies for Distributed Systems and Networks.

[11]  Daniel Jurafsky,et al.  Learning Syntactic Patterns for Automatic Hypernym Discovery , 2004, NIPS.

[12]  Clare-Marie Karat,et al.  An empirical study of natural language parsing of privacy policy rules using the SPARCLE policy workbench , 2006, SOUPS '06.

[13]  David W. Chadwick,et al.  A controlled natural language interface for authoring access control policies , 2011, SAC.

[14]  Rolf Schwitter,et al.  Controlled Natural Languages for Knowledge Representation , 2010, COLING.

[15]  Bob Martin,et al.  2010 CWE/SANS Top 25 Most Dangerous Software Errors , 2010 .

[16]  Christopher J. Novak,et al.  2009 Data Breach Investigations Report , 2009 .

[17]  Sabrina De Capitani di Vimercati,et al.  Access Control Policies, Models, and Mechanisms , 2011, Encyclopedia of Cryptography and Security.

[18]  Jian Pei,et al.  Data Mining: Concepts and Techniques, 3rd edition , 2006 .