Enterprise Data Classification Using Semantic Web Technologies

Organizations today collect and store large amounts of data in various formats and locations, however they are sometimes required to locate all instances of a certain type of data. Data classification enables efficient retrieval of information when needed. This work presents a reference implementation for enterprise data classification using Semantic Web technologies. We demonstrate automatic discovery and classification of Personally Identifiable Information (PII) in relational databases, using a classification model in RDF/OWL describing the elements to discover and classify. At the end of the process the results are also stored in RDF, enabling simple navigation between the input model and the findings in different databases. Recorded demo link: https://www.research.ibm.com/haifa/info/demos/piidiscovery_full.htm

[1]  Frederick Reiss,et al.  SystemT: a system for declarative information extraction , 2009, SGMD.

[2]  James A. Hendler,et al.  N3Logic: A logical framework for the World Wide Web , 2007, Theory and Practice of Logic Programming.

[3]  David J. Miller,et al.  A Mixture Model and EM-Based Algorithm for Class Discovery, Robust Classification, and Outlier Rejection in Mixed Labeled/Unlabeled Data Sets , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Jeffrey M. Bradshaw,et al.  KAoS policy management for semantic Web services , 2004, IEEE Intelligent Systems.

[5]  Yang Song,et al.  Boosting the Feature Space: Text Classification for Unstructured Data on the Web , 2006, Sixth International Conference on Data Mining (ICDM'06).

[6]  Lalana Kagal Rei : A Policy Language for the Me-Centric Project , 2002 .

[7]  Kazem Taghva,et al.  Ontology-based classification of email , 2003, Proceedings ITCC 2003. International Conference on Information Technology: Coding and Computing.

[8]  Peter Burden,et al.  Automatic RDF Metadata Generation for Resource Discovery , 1999, Comput. Networks.

[9]  Boris Motik,et al.  Ontologies for Enterprise Knowledge Management , 2003, IEEE Intell. Syst..

[10]  James A. Hendler,et al.  Transparent Accountable Data Mining: New Strategies for Privacy Protection , 2006, AAAI Spring Symposium: Semantic Web Meets eGovernment.

[11]  N. J. Davies,et al.  Managing the risks from information — through semantic information management , 2007 .

[12]  Pablo Castells,et al.  Semantic Web Technologies for Economic and Financial Information Management , 2004, ESWS.

[13]  Harold Boley,et al.  Relationships between Logic Programming and RDF , 2000, PRICAI Workshops.

[14]  J. Carroll,et al.  Jena: implementing the semantic web recommendations , 2004, WWW Alt. '04.

[15]  E. Prud hommeaux,et al.  SPARQL query language for RDF , 2011 .

[16]  Peter Haase,et al.  The NeOn Ontology Engineering Toolkit , 2008, WWW 2008.

[17]  Lora Aroyo,et al.  The Semantic Web: Research and Applications , 2009, Lecture Notes in Computer Science.

[18]  Steffen Staab,et al.  Knowledge Processes and Ontologies , 2001, IEEE Intell. Syst..

[19]  Stefan Conrad,et al.  Relational.OWL - A Data and Schema Representation Format Based on OWL , 2005, APCCM.

[20]  Nir Friedman,et al.  Class discovery in gene expression data , 2001, RECOMB.

[21]  Chris Hanson,et al.  Using Dependency Tracking to Provide Explanations for Policy Management , 2008, 2008 IEEE Workshop on Policies for Distributed Systems and Networks.

[22]  Wolfram Wöß,et al.  A Semantic Web middleware for Virtual Data Integration on the Web , 2008, ESWC.

[23]  Wang Xiaoyue,et al.  Applying RDF Ontologies to Improve Text Classification , 2009, 2009 International Conference on Computational Intelligence and Natural Computing.

[24]  Eric Miller,et al.  World Wide Web Consortium , 2004 .

[25]  Ian Dickinson,et al.  Semantic Middleware for E-Discovery , 2009, 2009 IEEE International Conference on Semantic Computing.