Complex data analysis scenarios often require discovering and combining multiple data sources. Data scientists usually formulate a series of SQL queries building on each other, also called a session, to iteratively derive results. However, due to a lack of familiarity with data sources or the complexity of query results, it can be a hard task to decide on the next query iteration solely based on the results of the last one. While existing approaches provide mechanisms to assess the results of a specific query, support for analyzing results in the context of the respective session remains mostly absent. Such approaches do also not seamlessly integrate with established tools and workflows. To overcome these problems, we introduce OCEANProfile, a framework for session-based profiling of query results. Query results are intercepted at driver level and streamed into our framework for automated data profiling. Result profiles can be compared with those of previous queries and visualized in a companion app compatible with existing analysis tools. Visualizations are automatically ranked according to their usefulness in the context of the respective session.
[1]
Peter K. Schwab,et al.
Query-Driven Knowledge-Sharing for Data Integration and Collaborative Data Science
,
2017,
ADBIS.
[2]
Arnon Rosenthal,et al.
The Challenge of “Quick and Dirty” Information Quality
,
2016,
ACM J. Data Inf. Qual..
[3]
Felix Naumann,et al.
Profiling relational data: a survey
,
2015,
The VLDB Journal.
[4]
Aditya G. Parameswaran,et al.
SeeDB: Efficient Data-Driven Visualization Recommendations to Support Visual Analytics
,
2015,
Proc. VLDB Endow..
[5]
Felix Naumann,et al.
Data Profiling with Metanome
,
2015,
Proc. VLDB Endow..
[6]
Laura M. Haas.
Leveraging Data and People to Accelerate Data Science
,
2017,
2017 IEEE 33rd International Conference on Data Engineering (ICDE).
[7]
Guoliang Li,et al.
DeepEye: An automatic big data visualization framework
,
2018,
Big Data Min. Anal..