Query Steering for Interactive Data Exploration

± ABSTRACT Traditional DBSMs are suited for applications in which the structure, meaning and contents of the database, as well as the questions to be asked are already well understood. There is, however, a class of applications that we will collectively refer to as Interactive Data Exploration (IDE) applications, in which this is not the case. IDE is a key ingredient of a diverse set of discovery-oriented applications we are dealing with, including ones from scientific computing, financial analysis, evidence-based medicine, and genomics. The need for effective IDE will only increase as data are being collected at an unprecedented rate. IDE is fundamentally a multi-step, non-linear process with imprecise end-goals. For example, data-driven scientific discovery through IDE often requires non-expert users to iteratively interact with the system to make sense of and to identify interesting patterns and relationships in large, amorphous data sets. To make the most of the increasingly available complex and big data sets, users would need an "expert assistant" who would be able to effectively and efficiently guide them through the data space. Having a human assistant is not only expensive but also unrealistic. Thus, it is essential that we automate this task. We propose database systems be augmented with an automated "database navigator" (DBNav) service that assists as a "tour guide" to facilitate IDE. Just like a car navigation system that offers advice on the routes to be taken and display points of interest, DBNav would similarly steer the user towards interesting "trajectories" through the data, while highlighting relevant features. Like any good tour guide, DBNav should consider many kinds of information; in particular, it should be sensitive to a user's goals and interests, as well as common navigation patterns that applications exhibit. We sketch a general data navigation framework and discuss some specific components and approaches that we believe belong to any such system.