Direct Manipulation Querying of Database Systems

Database systems are tremendously powerful and useful, as evidenced by their popularity in modern business. Unfortunately, for non-expert users, to use a database is still a daunting task due to its poor usability. This PhD dissertation examines stages in the information seeking process and proposes techniques to help users interact with the database through direct manipulation, which has been proven a natural interaction paradigm. For the first stage of information seeking, query formulation, we proposed a spreadsheet algebra upon which a direct manipulation interface for database querying can be built. We developed a spreadsheet algebra that is powerful (capable of expressing at least all single-block SQL queries) and can be intuitively implemented in a spreadsheet. In addition, we proposed assisted querying by browsing, where we help users query the database through browsing. For the second stage, result review, instead of asking users to review possibly many results in a flat table, we proposed a hierarchical navigation scheme that allows users to browse the results through representatives with easy drill-down and filtering capabilities. We proposed an efficient tree-based method for generating the representatives. For the query refinement stage, we proposed and implemented a provenance-based automatic refinement framework. Users label a set of output tuples and our framework produces a ranked list of changes that best improve the query. This dissertation significantly lowers the barrier for non-expert users and reduces the effort for expert users to use a database.

[1]  Dan Suciu,et al.  SnipSuggest: Context-Aware Autocompletion for SQL , 2010, Proc. VLDB Endow..

[2]  Christoph Koch A Visual Query Language for Complex-Value Databases , 2006, ArXiv.

[3]  Hans-Peter Kriegel,et al.  Knowledge Discovery in Large Spatial Databases: Focusing Techniques for Efficient Class Identification , 1995, SSD.

[4]  Hans-Peter Kriegel,et al.  A Database Interface for Clustering in Large Spatial Databases , 1995, KDD.

[5]  Sriram Raghavan,et al.  Regular Expression Learning for Information Extraction , 2008, EMNLP.

[6]  Ellen Riloff,et al.  Automatically Constructing a Dictionary for Information Extraction Tasks , 1993, AAAI.

[7]  Luis Gravano,et al.  Snowball: extracting relations from large plain-text collections , 2000, DL '00.

[8]  Ben Shneiderman,et al.  The future of interactive systems and the emergence of direct manipulation , 1982 .

[9]  Bin Liu,et al.  A Spreadsheet Algebra for a Direct Data Manipulation Query Interface , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[10]  F. Morii A Generalized K-Means Algorithm with Semi-Supervised Weight Coefficients , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[11]  Melanie Herschel,et al.  Explaining missing answers to SPJUA queries , 2010, Proc. VLDB Endow..

[12]  Divesh Srivastava,et al.  I4E: interactive investigation of iterative information extraction , 2010, SIGMOD Conference.

[13]  Claire Cardie,et al.  UMass/Hughes: Description of the CIRCUS System Used for MUC-51 , 1993, MUC.

[14]  J. D. Smith,et al.  Prototypes in the Mist: The Early Epochs of Category Learning , 1998 .

[15]  Susan T. Dumais,et al.  Learning user interaction models for predicting web search result preferences , 2006, SIGIR.

[16]  Michael Spenke,et al.  A spreadsheet interface for logic programming , 1989, CHI '89.

[17]  Jeffrey F. Naughton,et al.  Declarative Information Extraction Using Datalog with Embedded Extraction Predicates , 2007, VLDB.

[18]  Bin Liu,et al.  DataLens: making a good first impression , 2009, SIGMOD Conference.

[19]  Gerhard Weikum,et al.  Probabilistic information retrieval approach for ranking of database query results , 2006, TODS.

[20]  Michel Kuntz,et al.  Pasta-3's Graphical Query Language: Direct Manipulation, Cooperative Queries, Full Expressive Power , 1989, VLDB.

[21]  John Langford,et al.  Cover trees for nearest neighbor , 2006, ICML.

[22]  Dayne Freitag,et al.  Multistrategy Learning for Information Extraction , 1998, ICML.

[23]  Andreas Paepcke,et al.  PhotoSpread: A Spreadsheet for Managing Photos , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[24]  Jeffrey D. Ullman,et al.  Principles Of Database And Knowledge-Base Systems , 1979 .

[25]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[26]  Tiziana Catarci,et al.  An Ontology Based Visual Tool for Query Formulation Support , 2004, OTM Workshops.

[27]  Jian Pei,et al.  Efficiently Answering Top-k Typicality Queries on Large Databases , 2007, VLDB.

[28]  Magesh Jayapandian,et al.  Automated creation of a forms-based database query interface , 2008, Proc. VLDB Endow..

[29]  Branimir Boguraev,et al.  Annotation-based finite state processing in a large-scale NLP arhitecture , 2003, RANLP.

[30]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[31]  Kyriakos Mouratidis,et al.  Medoid Queries in Large Spatial Databases , 2005, SSTD.

[32]  J. Patel,et al.  Declarative Querying for Biological Sequences , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[33]  Ben Shneiderman,et al.  Intelligent software agents vs. user-controlled direct manipulation: a debate , 1997, CHI Extended Abstracts.

[34]  H. V. Jagadish,et al.  Qunits: queried units in database search , 2009, CIDR.

[35]  Frederick Reiss,et al.  SystemT: An Algebraic Approach to Declarative Information Extraction , 2010, ACL.

[36]  Ben Shneiderman A Computer Graphics System for Polynomials. , 1974 .

[37]  Cong Yu,et al.  Schema summarization , 2006, VLDB.

[38]  Jiawei Han,et al.  DataScope: Viewing Database Contents in Google Maps' Way , 2007, VLDB.

[39]  Jon Bentley,et al.  Programming pearls: algorithm design techniques , 1984, CACM.

[40]  Anthony K. H. Tung,et al.  Finding representative set from massive data , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[41]  P. Rousseeuw,et al.  Wiley Series in Probability and Mathematical Statistics , 2005 .

[42]  Thomas Berlage,et al.  FOCUS: the interactive table for product comparison and selection , 1996, UIST '96.

[43]  Moshé M. Zloof Query-by-example: the invocation and definition of tables and forms , 1975, VLDB '75.

[44]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[45]  Magesh Jayapandian,et al.  Expressive query specification through form customization , 2008, EDBT '08.

[46]  Gustavo Alonso,et al.  Perm: Processing Provenance and Data on the Same Data Model through Query Rewriting , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[47]  Moshé M. Zloof Query by example , 1975, AFIPS '75.

[48]  Bin Liu,et al.  Using Trees to Depict a Forest , 2009, Proc. VLDB Endow..

[49]  Christos Faloutsos,et al.  Analysis of the Clustering Properties of the Hilbert Space-Filling Curve , 2001, IEEE Trans. Knowl. Data Eng..

[50]  Val Tannen,et al.  Provenance semirings , 2007, PODS.

[51]  Amanda Spink,et al.  How are we searching the World Wide Web? A comparison of nine search engine transaction logs , 2006, Inf. Process. Manag..

[52]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[53]  Tiziana Catarci,et al.  Query by diagram: a graphical environment for querying databases , 1994, SIGMOD '94.

[54]  Kevin Chen-Chuan Chang,et al.  Supporting ranking and clustering as generalized order-by and group-by , 2007, SIGMOD '07.

[55]  Andreas Paepcke,et al.  The PhotoSpread Query Language , 2007 .

[56]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[57]  Ben Shneiderman,et al.  Direct Manipulation: A Step Beyond Programming Languages , 1983, Computer.

[58]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[59]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[60]  Douglas E. Appelt,et al.  FASTUS: A System for Extracting Information from Text , 1993, HLT.

[61]  Sihem Amer-Yahia,et al.  Efficient Computation of Diverse Query Results , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[62]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[63]  James Cheney,et al.  Provenance in Databases: Why, How, and Where , 2009, Found. Trends Databases.

[64]  Raghu Ramakrishnan,et al.  Database Management Systems , 1976 .

[65]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[66]  Adriane Chapman,et al.  Making database systems usable , 2007, SIGMOD '07.

[67]  Andrew McCallum,et al.  Accurate Information Extraction from Research Papers using Conditional Random Fields , 2004, NAACL.

[68]  Alfred V. Aho,et al.  Universality of data retrieval languages , 1979, POPL.

[69]  Srinivasan Parthasarathy,et al.  Query by output , 2009, SIGMOD Conference.

[70]  Ralph Grishman,et al.  Extracting Relations with Integrated Information Using Kernel Methods , 2005, ACL.

[71]  Michael Stonebraker,et al.  Tioga-2: a direct manipulation database visualization environment , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[72]  Douglas E. Appelt,et al.  The Common Pattern Specification Language , 1998, TIPSTER.

[73]  Cláudio T. Silva,et al.  Querying and re-using workflows with VsTrails , 2008, SIGMOD Conference.

[74]  Huan Liu,et al.  Subspace clustering for high dimensional data: a review , 2004, SKDD.

[75]  Ben Shneiderman,et al.  Sorting out searching: a user-interface framework for text searches , 1998, CACM.

[76]  Lei Sheng,et al.  Query By Excel , 2005, VLDB.

[77]  Pat Hanrahan,et al.  VizQL: a language for query, analysis and visualization , 2006, SIGMOD Conference.

[78]  Frederick Reiss,et al.  An Algebraic Approach to Rule-Based Information Extraction , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[79]  Oren Etzioni,et al.  TextRunner: Open Information Extraction on the Web , 2007, NAACL.

[80]  Raymond J. Mooney,et al.  Active Learning for Natural Language Parsing and Information Extraction , 1999, ICML.

[81]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[82]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS '01.

[83]  Abhinav Gupta,et al.  Spreadsheets in RDBMS for OLAP , 2003, SIGMOD '03.

[84]  T. J. Jankun-Kelly,et al.  A spreadsheet interface for visualization exploration , 2000 .

[85]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[86]  Tiziana Catarci,et al.  Visual Query Systems for Databases: A Survey , 1997, J. Vis. Lang. Comput..

[87]  Frederick Reiss,et al.  SystemT: a system for declarative information extraction , 2009, SGMD.

[88]  Kyriakos Mouratidis,et al.  Tree-based partition querying: a methodology for computing medoids in large spatial datasets , 2008, The VLDB Journal.

[89]  Jeffrey F. Naughton,et al.  On the provenance of non-answers to queries over extracted data , 2008, Proc. VLDB Endow..

[90]  R M Nosofsky,et al.  Similarity-scaling studies of dot-pattern classification and recognition. , 1992, Journal of experimental psychology. General.

[91]  Eirik Bakke,et al.  The Schema-Independent Database UI: A Proposed Holy Grail and Some Suggestions , 2011, CIDR.

[92]  Joseph M. Hellerstein,et al.  Potter's Wheel: An Interactive Data Cleaning System , 2001, VLDB.

[93]  Christos Faloutsos,et al.  Density biased sampling: an improved method for data mining and clustering , 2000, SIGMOD 2000.

[94]  Tao Li,et al.  Addressing diverse user preferences in SQL-query-result navigation , 2007, SIGMOD '07.

[95]  Safa R. Zaki,et al.  Exemplar and prototype models revisited: response strategies, selective attention, and stimulus generalization. , 2002, Journal of experimental psychology. Learning, memory, and cognition.

[96]  Stephen Glenn Soderland,et al.  Learning text analysis rules for domain-specific natural language processing , 1996 .

[97]  Jiawei Han,et al.  CLARANS: A Method for Clustering Objects for Spatial Data Mining , 2002, IEEE Trans. Knowl. Data Eng..

[98]  Raghu Ramakrishnan,et al.  Toward best-effort information extraction , 2008, SIGMOD Conference.

[99]  Sharad Mehrotra,et al.  XAR: An Integrated Framework for Information Extraction , 2009, 2009 WRI World Congress on Computer Science and Information Engineering.

[100]  Luis Gravano,et al.  Join Optimization of Information Extraction Output: Quality Matters! , 2009, 2009 IEEE 25th International Conference on Data Engineering.