User-driven refinement of imprecise queries

We propose techniques for exploratory search in large databases. The goal is to provide new functionality that aids users in homing in on the right query conditions to find what they are looking for. Query refinement proceeds interactively by repeatedly consulting the user to manage query conditions. This process is characterized by three key challenges: (1) dealing with incomplete and imprecise user input, (2) keeping user effort low, and (3) guaranteeing interactive system response time. We address the first two challenges with a probability-based framework that guides the user to the most important query conditions. To recover from input errors, we introduce the notion of sensitivity and propose efficient algorithms for identifying the most sensitive user input, i.e., those inputs that had the greatest influence on the query results. For the third challenge, we develop techniques that can deliver estimates of the required probabilities within a given hard realtime limit and are able to adapt automatically as the interactive query refinement proceeds.

[1]  Rich Caruana,et al.  Predicting good probabilities with supervised learning , 2005, ICML.

[2]  Dan Suciu,et al.  Probabilistic databases , 2011, SIGA.

[3]  Rob Miller,et al.  Crowdsourced Databases: Query Processing with People , 2011, CIDR.

[4]  Tim Kraska,et al.  Leveraging transitive relations for crowdsourced joins , 2013, SIGMOD '13.

[5]  Abraham Silberschatz,et al.  Learning and verifying quantified boolean queries by example , 2013, PODS '13.

[6]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[7]  Pietro Perona,et al.  Visual Recognition with Humans in the Loop , 2010, ECCV.

[8]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[9]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[10]  Gerhard Weikum,et al.  Best-Effort Top-k Query Processing Under Budgetary Constraints , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[11]  Sau Dan Lee,et al.  Decision Trees for Uncertain Data , 2011, IEEE Transactions on Knowledge and Data Engineering.

[12]  Javed A. Aslam,et al.  Searching in the presence of linearly bounded errors , 1991, STOC '91.

[13]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[14]  Jian Pei,et al.  Data Mining: Concepts and Techniques, 3rd edition , 2006 .

[15]  Jennifer Widom,et al.  Human-assisted graph search: it's okay to ask questions , 2011, Proc. VLDB Endow..

[16]  Mirek Riedewald,et al.  User-driven refinement of imprecise queries , 2014, ICDE.

[17]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[18]  Joseph M. Hellerstein,et al.  USHER: Improving data quality with dynamic forms , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[19]  Dan Roth,et al.  Learning cost-sensitive active classifiers , 2002, Artif. Intell..

[20]  Lawrence Carin,et al.  Cost-sensitive feature acquisition and classification , 2007, Pattern Recognit..

[21]  Raymond T. Ng,et al.  Distance-based outliers: algorithms and applications , 2000, The VLDB Journal.