Characterizing Regulatory Documents and Guidelines Based on Text Mining

Implementing rules, constraints, and requirements contained in regulatory documents such as standards or guidelines constitutes a mandatory task for organizations and institutions across several domains. Due to the amount of domain-specific information and actions encoded in these documents, organizations often need to establish cooperations between several departments and consulting experts to guide managers and employees in eliciting compliance requirements. Providing computer-based guidance and support for this often costly and tedious compliance task is the aim of this paper. The presented methodology utilizes well-known text mining techniques and clustering algorithms to classify (families) of documents according to topics and to derive significant sentences which support users in understanding and implementing compliance-related documents. Applying the approach to collections of documents from the security and the medical domain demonstrates that text mining is a promising domain-independent mean to provide support to the understanding, extraction, and analysis of regulatory documents.

[1]  Kurt Hornik,et al.  Text Mining Infrastructure in R , 2008 .

[2]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[3]  Alexander Maedche,et al.  The state of the art in automated requirements elicitation , 2013, Inf. Softw. Technol..

[4]  Pawel Lewicki,et al.  Statistics : methods and applications : a comprehensive reference for science, industry, and data mining , 2006 .

[5]  Stefanie Rinderle-Ma,et al.  Using Content Analysis for Privacy Requirement Extraction and Policy Formalization , 2015, EMISA.

[6]  Josep Malvehy,et al.  Diagnosis and treatment of melanoma: European consensus-based interdisciplinary guideline. , 2010, European journal of cancer.

[7]  Kurt Hornik,et al.  Spherical k-Means Clustering , 2012 .

[8]  Fernando Gomez,et al.  A System for the Semiautomatic Generation of E-R Models from Natural Language Specifications , 1999, Data Knowl. Eng..

[9]  R. Dennis Cook,et al.  GrassmannOptim: An R Package for Grassmann Manifold Optimization , 2012 .

[10]  Priyanka More,et al.  Generating UML Diagrams from Natural Language Specifications , 2012 .

[11]  Jane Cleland-Huang,et al.  A recommender system for requirements elicitation in large-scale software projects , 2009, SAC '09.

[12]  Aditya K. Ghose,et al.  Analyst-Mediated Contextualization of Regulatory Policies , 2010, 2010 IEEE International Conference on Services Computing.

[13]  Ronen Feldman,et al.  Book Reviews: The Text Mining Handbook: Advanced Approaches to Analyzing Unstructured Data by Ronen Feldman and James Sanger , 2008, CL.

[14]  R. L. Thorndike Who belongs in the family? , 1953 .

[15]  Jan Mendling,et al.  Process Model Generation from Natural Language Text , 2011, CAiSE.

[16]  Stefanie Rinderle-Ma,et al.  Assessing Medical Treatment Compliance Based on Formal Process Modeling , 2011, USAB.

[17]  Shahnorbanun Sahran,et al.  Automation of database design through semantic analysis , 2008 .

[18]  Henrik Leopold,et al.  Natural Language in Business Process Models , 2013, Lecture Notes in Business Information Processing.

[19]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[20]  Aditya K. Ghose,et al.  Rapid Business Process Discovery ( R- BPD) , 2007, ER.

[21]  Muhammad Ali Babar,et al.  An Automated Tool for Generating UML Models from Natural Language Requirements , 2009, 2009 IEEE/ACM International Conference on Automated Software Engineering.

[22]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..