Security in Data Warehouses

The last several years have been characterized by global companies building up massive databases containing computer users’ search queries and sites visited; government agencies accruing sensitive data and extrapolating knowledge from uncertain data with little incentive to provide citizens ways of correcting false data; and individuals who can easily combine publicly available data to derive information that – in former times – was not so readily accessible. Security in data warehouses becomes more important as reliable and appropriate security mechanisms are required to achieve the desired level of privacy protection. INTRODUCTION Landwehr (2001) defines how the etymological roots of the term “secure” are found in “se” which means “without,” or “apart from,” and “cure,” i.e. “to care for,” or “to be concerned about”. While there are many definitions of the primary requirements of security, the classical requirements are summarized by the acronym CIA. CIA is the acronym for confidentiality, integrity, and availability. All other security requirements such as nonrepudiation can be traced back to these three basic properties. Avizienis (2004) defines confidentiality as the absence of unauthorized disclosure of information, integrity as the absence of improper system alterations and availability as readiness for correct service. • Dependability is a broader concept that encompasses all primary aspects of security save confidentiality, and, in addition. • Reliability, which refers to the continuity of correct service. • Safety, defined as the absence of catastrophic consequences for user(s) and environment. • Maintainability, which is the ability to undergo modifications and repairs. BACKGROUND While security obviously encompasses the requirements of the CIA triad this article will focus on the mechanism of access control (AC) as this addresses both confidentiality and—to some extent—integrity. Database security was addressed in the 1960s by introducing mandatory access control (MAC), driven mainly by military requirements. Today, role-based access control (RBAC) is the commonly used access control model in commercial databases. There is a difference between trusting a person and trusting a program. For instance, Alice gives Bob a program that Alice trusts. Since Bob trusts Alice he trusts the program. However neither of them is aware that the program contains a Trojan. This security threat leads to the introduction of MAC. In MAC, the system itself imposes an access control policy and object owners cannot change that policy. MAC is often implemented in Security in Data Warehouses 4 systems with mulitlevel security (MLS). In MLS information objects are classified in different levels and subjects are cleared for levels. The need-to-know principle, also known from the military, stipulates that every subject receives only the information required to perform its task. To comply with this principle, it is not sufficient to use sensitivity labels to classify objects. Every object is associated with a set of compartments. Subjects are classified according to their security clearance for each given area/compartment. Classification labels are of the form (Ss,Sc) where Sr is a sensitivity and Sc a set of compartments. (Os,Oc) dominates (Ss,Sc) if (Ss,Sc)<=(Os,Oc). This <= relation is true when • Ss<=Os where the <= relationship here is with respect to the classified < sensitive < secret < top secret sensitivity classification, and • Sc<=Oc where the <= relationship is a subset relation of sets. The Bell LaPadula (BLP) model (1975) forms the fundamental architectural idea behind guarantee of secrecy in MLS. The Biba model by the Mitre Corporation (1997) is used to protect integrity: BLP’s no-read-up and no-write-down properties are inverted to the nowrite-up and no-read-down rules. Today, Oracle’s Label Security and DB2’s Label Access Control are contemporary examples of this security model. The most widely used access control model is the role-based access control (RBAC) model. This section will briefly summarize various properties of NIST’s RBAC model as pointed out by Sandhu et al. (2000). The notion of scalability is multi-dimensional. RBAC does not define the degree of scalability implemented in a system with respect to the number of roles, number of permissions, size of role hierarchy, or limits on user-role assignments, etc. As RBAC is based on permissions that confer the ability to do something on holders of the permission, it does not contain negative authorizations (prohibitions). The nature of permissions is not specified in the RBAC model itself. Permissions can be either finegrained or coarse-grained and may also be customized. The exact nature of permissions is determined by the application. Moreover, RBAC does not specify the ability of a user to select which roles are activated in a particular session. The only requirement is that it should be possible to allow a user to activate multiple roles simultaneously. It does not matter if the user is able to explicitly activate roles or if all roles are automatically activated by the system. RBAC Constraints Since permissions are organized into tasks by using roles, conflicts of interests are more evident than if dealing with permissions on a per-user basis. In fact, a conflict of interest among permissions on an individual basis is hard if not impossible to determine. Separation of duties among roles (i.e., defining mutually exclusive roles) provides the administrator with enhanced capabilities to specify and enforce enterprise policies. Since RBAC has static (user-role membership) and dynamic (role activation) aspects, the following two possibilities can be distinguished accordingly. First, Static Separation of Duties (SSD) is based on user-role membership. It enforces Security in Data Warehouses 5 constraints on the assignment of users to roles. This means that if a user is authorized as a member of one role, the user is prohibited from being a member of a second role. Constraints are inherited within a role hierarchy. Second, Dynamic Separation of Duties (DSD) is based on role activation. It is employed when a user is authorized for more roles that must not be activated simultaneously. DSD is necessary to prohibit a user from circumventing a policy requirement by activating another role. Administrating RBAC Definition of roles and constraints, assigning permissions to roles, and granting membership to roles are the most common administrative tasks in RBAC. When a new employee enters the company, the administrator simply adds this person to one or more existing roles according to the users tasks and needs. Similarly, users can be removed from a role when they leave the company or added to new roles when their functions change. It is commonly agreed that one of RBAC’s biggest advantages is its easy administration. Nonetheless, managing a large number of roles can still be a difficult task. However, Sandhu and Coyne (1996) present an intriguing concept that shows how RBAC might be used to manage itself. An administrative role hierarchy is introduced, which is mapped to a subset of the role hierarchy it manages. Coexistence with MAC / DAC Mandatory access control is based on distinct levels of security to which subjects and objects are assigned. Discretionary access control (DAC) controls access to an object on the basis of an individual user’s permissions and/or prohibitions. RBAC, however, is an independent component of these access controls and can coexist with MAC and DAC. RBAC can be used to enforce MAC and DAC policies as shown in (2000). The authors point out the possibilities and configurations necessary to use RBAC in the sense of MAC or DAC. For a detailed discussion on defining and organizing roles please refer to Nyanchama and Osborn (1994), who introduce a formal role graph to facilitate role administration. Ferraiolo and Kuhn (1992), for example, published fundamental concepts on granting and revoking membership to the set of specified named roles. SCIENTIFIC CONCEPTS Classic access control is still the mechanism of choice to protect not only databases but also data warehouses. The difference between a database and a data warehouse is that database is designed and optimized to process individual tuples and the data warehouse is optimized to respond to queries that analyze aggregated data. OLTP (On-Line Transaction Processing) systems are secured by controlling access to individual tuples but for data warehouses the issue of data protection is more complex. For typical access control there are several shortcomings. First and foremost, users can do anything with the data once they have access to it; Second, even if access to fine grained detail data is not permitted, Security in Data Warehouses 6 querying different similar datasets can reveal fine details; this is also known as inference attacks. The first issue can be addressed—in theory—by usage control as described by Park and Sandhu (2004), the second by several methods of statistical database security. Both topics are very active fields of research. Usage Control The main problem with data collection is that people might allow companies to use data for specific reasons (such as recommending related products) but do not consent to other uses of the same data. Usage control byPark and Sandhu (2004) is a concept that makes it possible to enforce preand postconditions when using data. It is similar to a traditional reference monitor, only that the restrictions are enforced during the entire access, as proposed by Thuraisingham (2005): The privacy control would “limit and watch access to the DBMS (that access the data in the database).” Figure 1: The UCONABC usage control model by Park and Sandhu (2004). Statistical Database Security A statistical database contains information about individuals, but allows only aggregate Security in Data Warehouses 7 queries (such as asking for the aver

[1]  Yucel Saygin Privacy and Confidentiality Issues in Data Mining , 2008 .

[2]  Jaehong Park,et al.  The UCONABC usage control model , 2004, TSEC.

[3]  John Wang Montclair Data Warehousing and Mining : Concepts , Methodologies , Tools , and Applications , 2008 .

[4]  Carl E. Landwehr,et al.  Computer security , 2001, International Journal of Information Security.

[5]  Ton de Waal,et al.  Statistical Disclosure Control in Practice , 1996 .

[6]  Mario Piattini,et al.  Access control and audit model for the multidimensional modeling of data warehouses , 2006, Decis. Support Syst..

[7]  D. Richard Kuhn,et al.  Role-Based Access Controls , 2009, ArXiv.

[8]  K J Biba,et al.  Integrity Considerations for Secure Computer Systems , 1977 .

[9]  Dorothy E. Denning,et al.  Inference Controls for Statistical Databases , 1983, Computer.

[10]  David Alan Hanson,et al.  Data security , 1979, ACM-SE 17.

[11]  A. Handler BASIC , 1964 .

[12]  Silvana Castano,et al.  Database Security , 1997, IFIP Advances in Information and Communication Technology.

[13]  Nabil R. Adam,et al.  Security-control methods for statistical databases: a comparative study , 1989, ACM Comput. Surv..

[14]  K. J. Bma Integrity considerations for secure computer systems , 1977 .

[15]  Ravi S. Sandhu,et al.  Role-Based Access Control Models , 1996, Computer.

[16]  Elisa Bertino,et al.  Secure knowledge management: confidentiality, trust, and privacy , 2006, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[17]  Ravi S. Sandhu,et al.  The NIST model for role-based access control: towards a unified standard , 2000, RBAC '00.

[18]  Richard S. Segall,et al.  Comparing Four-Selected Data Mining Software , 2009, Encyclopedia of Data Warehousing and Mining.

[19]  Bhavani M. Thuraisingham,et al.  Privacy constraint processing in a privacy-enhanced database management system , 2005, Data Knowl. Eng..

[20]  Grigorios Loukides,et al.  Capturing data usefulness and privacy protection in K-anonymisation , 2007, SAC '07.

[21]  Matt Bishop,et al.  What Is Computer Security? , 2003, IEEE Secur. Priv..

[22]  Ravi S. Sandhu,et al.  Configuring role-based access control to enforce mandatory and discretionary access control policies , 2000, TSEC.

[23]  Mário Guimarães New challenges in teaching database security , 2006, InfoSecCD '06.

[24]  Carl E. Landwehr,et al.  Basic concepts and taxonomy of dependable and secure computing , 2004, IEEE Transactions on Dependable and Secure Computing.

[25]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[26]  D. Elliott Bell,et al.  Secure Computer System: Unified Exposition and Multics Interpretation , 1976 .

[27]  Philip Calvert,et al.  Encyclopedia of Data Warehousing and Mining , 2006 .

[28]  Dorothy E. Denning,et al.  Cryptography and Data Security , 1982 .

[29]  Charu C. Aggarwal,et al.  On k-Anonymity and the Curse of Dimensionality , 2005, VLDB.

[30]  Pierangela Samarati,et al.  Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression , 1998 .

[31]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.