Building blocks for exploratory data analysis tools

Data exploration is largely manual and labor intensive. Although there are various tools and statistical techniques that can be applied to data sets, there is little help to identify what questions to ask of a data set, let alone what domain knowledge is useful in answering the questions. In this paper, we study user queries against production data sets in Splunk. Specifically, we characterize the interplay between data sets and the operations used to analyze them using latent semantic analysis, and discuss how this characterization serves as a building block for a data analysis recommendation system. This is a work-in-progress paper.

[1]  John W. Tukey,et al.  Exploratory Data Analysis. , 1979 .

[2]  Pat Hanrahan,et al.  Polaris: a system for query, analysis and visualization of multi-dimensional relational databases , 2000, IEEE Symposium on Information Visualization 2000. INFOVIS 2000. Proceedings.

[3]  Paul R. Cohen,et al.  Intelligent Support for Exploratory Data Analysis , 1998 .

[4]  Jeffrey Heer,et al.  Wrangler: interactive visual specification of data transformation scripts , 2011, CHI.

[5]  Jade Goldstein-Stewart,et al.  Interactive graphic design using automatic presentation knowledge , 1994, CHI Conference Companion.

[6]  Jock D. Mackinlay,et al.  Automating the design of graphical presentations of relational information , 1986, TOGS.

[7]  Stephen M. Casner,et al.  Task-analytic approach to the automated design of graphic presentations , 1991, TOGS.

[8]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[9]  Thomas G. Dietterich,et al.  Guiding Scientific Discovery with Explanations Using DEMUD , 2013, AAAI.

[10]  Daniel Perry,et al.  VizDeck: self-organizing dashboards for visual analytics , 2012, SIGMOD Conference.

[11]  Robert Wilensky,et al.  Designing graphic presentations from first principles , 1998 .

[12]  Archana Ganapathi,et al.  Optimizing Data Analysis with a Semi-structured Time Series Database , 2010, SLAML.

[13]  Abraham Bernstein,et al.  Toward intelligent assistance for a data mining process: an ontology-based approach for cost-sensitive classification , 2005, IEEE Transactions on Knowledge and Data Engineering.

[14]  Jeffrey Heer,et al.  Profiler: integrated statistical analysis and visualization for data quality assessment , 2012, AVI.

[15]  Gerhard Friedrich,et al.  Recommender Systems - An Introduction , 2010 .