How the Conceptual Modelling Improves the Security on Document Databases

Big Data is becoming a prominent trend in our society. Ever larger amounts of data, including sensitive and personal information, are being loaded into NoSQL and other Big Data technologies for analysis and processing. However, current security approaches do not take into account the special characteristics of these technologies, leaving sensitive and personal data unprotected, thereby risking severe monetary losses and brand damage. In this paper, we focus on assuring document databases, proposing a framework that considers three stages: (1) The source data set is analysed by using Natural Language Processing techniques and ontological resources, in order to detect sensitive data. (2) We define a metamodel for document databases that allows designers to specify both structural and security aspects. (3) This model is automatically implemented into a specific document database tool, MongoDB. Finally, we apply the proposed framework to a case study with a data set from the medical domain. The great advantages of our framework are that: (1) the effort required to secure the data is reduced, as part of the process is automated, (2) it can be easily applied to other NoSQL technologies by adapting the metamodel and transformations, and (3) it is aligned with existing standards, thus facilitating the application of recommendations and best practices.