Templated Search over Relational Databases

Businesses and large organizations accumulate increasingly large amounts of customer interaction data. Analysis of such data holds great importance for tasks such as strategic planning and orchestration of sales/marketing campaigns. However, discovery and analysis over heterogeneous enterprise data can be challenging. Primary reasons for this are dispersed data repositories, requirements for schema knowledge, and difficulties in using complex user interfaces. As a solution to the above, we propose a TEmplated Search paradigm (TES) for exploring relational data that combines the advantages of keyword search interfaces with the expressive power of question-answering systems. The user starts typing a few keywords and TES proposes data exploration questions in real time. A key aspect of our approach is that the questions displayed are diverse to each other and optimally cover the space of possible questions for a given question-ranking framework. Efficient exact and provably approximate algorithms are presented. We show that the Templated Search paradigm renders the potentially complex underlying data sources intelligible and easily navigable. We support our claims with experimental results on real-world enterprise data.

[1]  Berthold Reinwald,et al.  Towards keyword-driven analytical processing , 2007, SIGMOD '07.

[2]  Chao Yang,et al.  Unicorn: A System for Searching the Social Graph , 2013, Proc. VLDB Endow..

[3]  Marianne Winslett,et al.  Keyword search for data-centric XML collections with long text fields , 2010, EDBT '10.

[4]  Divesh Srivastava,et al.  Recommending Join Queries via Query Log Analysis , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[5]  Yannis Papakonstantinou,et al.  QURSED: querying and reporting semistructured data , 2002, SIGMOD '02.

[6]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[7]  S. Sudarshan,et al.  Keyword searching and browsing in databases using BANKS , 2002, Proceedings 18th International Conference on Data Engineering.

[8]  Barun Chandra,et al.  Approximation Algorithms for Dispersion Problems , 2001, J. Algorithms.

[9]  Sreenivas Gollapudi,et al.  An axiomatic approach for result diversification , 2009, WWW '09.

[10]  R. Chandrasekaran,et al.  Location on Tree Networks: P-Centre and n-Dispersion Problems , 1981, Math. Oper. Res..

[11]  Surajit Chaudhuri,et al.  DBXplorer: a system for keyword-based search over relational databases , 2002, Proceedings 18th International Conference on Data Engineering.

[12]  Panayiotis Tsaparas,et al.  Structured annotations of web queries , 2010, SIGMOD Conference.

[13]  Evaggelia Pitoura,et al.  DisC diversity: result diversification based on dissimilarity and coverage , 2012, Proc. VLDB Endow..

[14]  Panayiotis Tsaparas,et al.  Facet discovery for structured web search: a query-log mining approach , 2011, SIGMOD '11.

[15]  Yunyao Li,et al.  Automatic suggestion of query-rewrite rules for enterprise search , 2012, SIGIR '12.

[16]  Magesh Jayapandian,et al.  Automated creation of a forms-based database query interface , 2008, Proc. VLDB Endow..

[17]  Wei Zheng,et al.  Exploiting concept hierarchy for result diversification , 2012, CIKM.

[18]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[19]  Oren Etzioni,et al.  Towards a theory of natural language interfaces to databases , 2003, IUI '03.

[20]  Stephen E. Robertson,et al.  A probabilistic model of information retrieval: development and comparative experiments - Part 1 , 2000, Inf. Process. Manag..

[21]  Sandeep Tata,et al.  SQAK: doing more with keywords , 2008, SIGMOD Conference.

[22]  Luis Gravano,et al.  Efficient IR-Style Keyword Search over Relational Databases , 2003, VLDB.

[23]  B. Korte,et al.  An Analysis of the Greedy Heuristic for Independence Systems , 1978 .

[24]  Wolfgang Nejdl,et al.  Efficient query construction for large scale data , 2013, SIGIR.

[25]  Refael Hassin,et al.  Approximation algorithms for maximum dispersion , 1997, Oper. Res. Lett..

[26]  Jim Webber,et al.  Graph Databases: New Opportunities for Connected Data , 2013 .

[27]  Magesh Jayapandian,et al.  Expressive query specification through form customization , 2008, EDBT '08.

[28]  Evaggelia Pitoura,et al.  Search result diversification , 2010, SGMD.

[29]  Prasenjit Mitra,et al.  Query suggestions in the absence of query logs , 2011, SIGIR.

[30]  W. Bruce Croft,et al.  Term level search result diversification , 2013, SIGIR.

[31]  S. S. Ravi,et al.  Heuristic and Special Case Algorithms for Dispersion Problems , 1994, Oper. Res..

[32]  W. Bruce Croft,et al.  Diversity by proportionality: an election-based approach to search result diversification , 2012, SIGIR '12.

[33]  Vagelis Hristidis,et al.  DISCOVER: Keyword Search in Relational Databases , 2002, VLDB.

[34]  Enrico Motta,et al.  SemSearch: A Search Engine for the Semantic Web , 2006, EKAW.

[35]  Jeffrey F. Naughton,et al.  Combining keyword search and forms for ad hoc querying of databases , 2009, SIGMOD Conference.

[36]  Zhi Cai,et al.  Size-l Object Summaries for Relational Keyword Search , 2011, Proc. VLDB Endow..