Towards Semantically Enhanced Data Understanding

In the field of machine learning, data understanding is the practice of getting initial insights in unknown datasets. Such knowledge-intensive tasks require a lot of documentation, which is necessary for data scientists to grasp the meaning of the data. Usually, documentation is separate from the data in various external documents, diagrams, spreadsheets and tools which causes considerable look up overhead. Moreover, other supporting applications are not able to consume and utilize such unstructured data. That is why we propose a methodology that uses a single semantic model that interlinks data with its documentation. Hence, data scientists are able to directly look up the connected information about the data by simply following links. Equally, they can browse the documentation which always refers to the data. Furthermore, the model can be used by other approaches providing additional support, like searching, comparing, integrating or visualizing data. To showcase our approach we also demonstrate an early prototype.

[1]  Andreas Dengel,et al.  An Easy & Collaborative RDF Data Entry Method using the Spreadsheet Metaphor , 2018, ArXiv.

[2]  Thomas R. Gruber,et al.  A translation approach to portable ontology specifications , 1993, Knowl. Acquis..

[3]  Ramanathan V. Guha,et al.  Semantic search , 2003, WWW '03.

[4]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[5]  Steffen Stadtmüller,et al.  Simplified SPARQL REST API - CRUD on JSON Object Graphs via URI Paths , 2018, ESWC.

[6]  Dumitru Roman,et al.  Tabular Data Anomaly Patterns , 2017, 2017 International Conference on Big Data Innovations and Applications (Innovate-Data).

[7]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[8]  Juha Vesanto,et al.  An Automated Report Generation Tool for the Data Understanding Phase , 2001, HIS.

[9]  Korry Douglas,et al.  PostgreSQL: A Comprehensive Guide to Building, Programming, and Administering PostgreSQL Databases , 2003 .

[10]  Andrea Giovanni Nuzzolese,et al.  The Semantic Web: ESWC 2018 Satellite Events , 2018, Lecture Notes in Computer Science.

[11]  MusílekPetr,et al.  A survey of Knowledge Discovery and Data Mining process models , 2006 .

[12]  Deborah L. McGuinness,et al.  Contextual Data Collection for Smart Cities , 2015, S4SC@ISWC.

[13]  Hao Wang,et al.  Semantic data mining: A survey of ontology-based approaches , 2015, Proceedings of the 2015 IEEE 9th International Conference on Semantic Computing (IEEE ICSC 2015).