Keyword-based querying for the social semantic web: the KWQL language: concept, algorithm and system

Enabling non-experts to publish data on the web is an important achievement of the social web and one of the primary goals of the social semantic web. Making the data easily accessible in turn has received only little attention, which is problematic from the point of view of incentives: users are likely to be less motivated to participate in the creation of content if the use of this content is mostly reserved to experts. Querying in semantic wikis, for example, is typically realized in terms of full text search over the textual content and a web query language such as SPARQL for the annotations. This approach has two shortcomings that limit the extent to which data can be leveraged by users: combined queries over content and annotations are not possible, and users either are restricted to expressing their query intent using simple but vague keyword queries or have to learn a complex web query language. The work presented in this dissertation investigates a more suitable form of querying for semantic wikis that consolidates two seemingly conflicting characteristics of query languages, ease of use and expressiveness. This work was carried out in the context of the semantic wiki KiWi, but the underlying ideas apply more generally to the social semantic and social web. We begin by defining a simple modular conceptual model for the KiWi wiki that enables rich and expressive knowledge representation. A component of this model are structured tags, an annotation formalism that is simple yet flexible and expressive, and aims at bridging the gap between atomic tags and RDF. The viability of the approach is confirmed by a user study, which finds that structured tags are suitable for quickly annotating evolving knowledge and are perceived well by the users. The main contribution of this dissertation is the design and implementation of KWQL, a query language for semantic wikis. KWQL combines keyword search and web querying to enable querying that scales with user experience and information need: basic queries are easy to express; as the search criteria become more complex, more expertise is needed to formulate the corresponding query. A novel aspect of KWQL is that it combines both paradigms in a bottom-up fashion. It treats neither of the two as an extension to the other, but instead integrates both in one framework. The language allows for rich combined queries of full text, metadata, document structure, and informal to formal semantic annotations. KWilt, the KWQL query engine, provides the full expressive power of first-order queries, but at the same time can evaluate basic queries at almost the speed of the underlying search engine. KWQL is accompanied by the visual query language visKWQL, and an editor that displays both the textual and visual form of the current query and reflects changes to either representation in the other. A user study shows that participants quickly learn to construct KWQL and visKWQL queries, even when given only a short introduction. KWQL allows users to sift the wealth of structure and annotations in an information system for relevant data. If relevant data constitutes a substantial fraction of all data, ranking becomes important. To this end, we propose PEST, a novel ranking method that propagates relevance among structurally related or similarly annotated data. Extensive experiments, including a user study on a real life wiki, show that pest improves the quality of the ranking over a range of existing ranking approaches.

[1]  Gabriel M. Kuper,et al.  Structural properties of XPath fragments , 2003, Theor. Comput. Sci..

[2]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[3]  Christian Wolff,et al.  Tree, funny, to_read, google: what are tags supposed to achieve? a comparative analysis of user keywords for different digital resource types , 2008, SSM '08.

[4]  Tok Wang Ling,et al.  On boosting holism in XML twig pattern matching using structural indexing techniques , 2005, SIGMOD '05.

[5]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[6]  李幼升,et al.  Ph , 1989 .

[7]  Claudio Gutiérrez,et al.  Querying RDF Data from a Graph Database Perspective , 2005, ESWC.

[8]  Torsten Grust,et al.  MonetDB/XQuery: a fast XQuery processor powered by a relational engine , 2006, SIGMOD Conference.

[9]  Mor Naaman,et al.  Why we tag: motivations for annotation in mobile and online media , 2007, CHI.

[10]  Stephen Cranefield UML and the Semantic Web , 2001, SWWS.

[11]  Sara Comai,et al.  Graphical Query Languages for Semi-Structured Information , 2000, EDBT PhD Workshop.

[12]  Sherif Sakr,et al.  XQuery on SQL Hosts , 2004, VLDB.

[13]  Gerhard Weikum,et al.  Searching RDF Graphs with SPARQL and Keywords , 2010, IEEE Data Eng. Bull..

[14]  Georg Gottlob,et al.  Monadic queries over tree-structured data , 2002, Proceedings 17th Annual IEEE Symposium on Logic in Computer Science.

[15]  Michael G Maihofer,et al.  Reinvent the wheel. , 2005, The Journal of the Michigan Dental Association.

[16]  Werner Nutt,et al.  EquiX - A search and query language for XML , 2000, J. Assoc. Inf. Sci. Technol..

[17]  Philip Bille,et al.  A survey on tree edit distance and related problems , 2005, Theor. Comput. Sci..

[18]  Otis Gospodnetic,et al.  Lucene in Action (In Action series) , 2004 .

[19]  Tim Furche,et al.  Towards Data-Integration on the Semantic Web: Querying RDF with Xcerpt , 2005 .

[20]  François Bry,et al.  Xcerpt and visXcerpt: From Pattern-Based to Visual Querying of XML and Semistructured Data , 2003, VLDB.

[21]  Alessandro Campi,et al.  Design and implementation of a graphical interface to XQuery , 2003, SAC '03.

[22]  Divesh Srivastava,et al.  Holistic twig joins: optimal XML pattern matching , 2002, SIGMOD '02.

[23]  François Bry,et al.  Semantic Wikis , 2008, IEEE Software.

[24]  Pascal Francq,et al.  Index and Search XML Documents by Combining Content and Structure , 2006, International Conference on Internet Computing.

[25]  Letizia Tanca,et al.  XML-GL: A Graphical Language for Querying and Restructuring XML Documents , 1999, SEBD.

[26]  M. Erwig Xing: a visual XML query language , 2003, J. Vis. Lang. Comput..

[27]  Frank van Harmelen,et al.  Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema , 2002, SEMWEB.

[28]  Peter Fankhauser,et al.  XQuery by the Book: The IPSI XQuery Demonstrator , 2002, EDBT.

[29]  Daniel J. Abadi,et al.  Scalable Semantic Web Data Management Using Vertical Partitioning , 2007, VLDB.

[30]  Feng Shao,et al.  XRANK: ranked keyword search over XML documents , 2003, SIGMOD '03.

[31]  Yolanda Gil,et al.  A survey of trust in computer science and the Semantic Web , 2007, J. Web Semant..

[32]  Wei-Ying Ma,et al.  Probabilistic query expansion using query logs , 2002, WWW '02.

[33]  Alberto O. Mendelzon,et al.  A graphical query language supporting recursion , 1987, SIGMOD '87.

[34]  Stefano Paraboschi,et al.  Active XQuery , 2002, Proceedings 18th International Conference on Data Engineering.

[35]  Clemens Ley RDFLog: It's like Datalog for RDF , 2008 .

[36]  Georg Gottlob,et al.  The complexity of XPath query evaluation , 2003, PODS.

[37]  Tiziana Catarci,et al.  Visual Query Systems for Databases: A Survey , 1997, J. Vis. Lang. Comput..

[38]  Alistair Moffat,et al.  Compression and an IR Approach to XML Retrieval , 2002, INEX Workshop.

[39]  François Bry,et al.  Reasoning on the semantic web: beyond ontology languages and reasoners , 2005 .

[40]  Allan Collins,et al.  A spreading-activation theory of semantic processing , 1975 .

[41]  Rose Dieng,et al.  Using a Semantic Wiki in Communities of Practice , 2008 .

[42]  Michael Benedikt,et al.  Interpreting Tree-to-Tree Queries , 2006, ICALP.

[43]  Yehoshua Sagiv,et al.  XSEarch: A Semantic Search Engine for XML , 2003, VLDB.

[44]  Anne M. Cregan Symbol Grounding for the Semantic Web , 2007, ESWC.

[45]  Georg Lausen,et al.  Access to Objects by Path Expressions and Rules , 1994, VLDB.

[46]  Michael Benedikt,et al.  XPath leashed , 2009, CSUR.

[47]  Asunción Gómez-Pérez,et al.  Six challenges for the Semantic Web , 2002, KR 2002.

[48]  Uzay Kaymak,et al.  RDF-GL: A SPARQL-Based Graphical Query Language for RDF , 2010, Emergent Web Intelligence.

[49]  Daniela Florescu,et al.  Quilt: An XML Query Language for Heterogeneous Data Sources , 2000, WebDB.

[50]  Massimo Marchiori,et al.  Ten Theses on Logic Languages for the Semantic Web , 2005, Rule Languages for Interoperability.

[51]  Sihem Amer-Yahia,et al.  XML Full-Text Search: Challenges and Opportunities , 2005, VLDB.

[52]  S. Sudarshan,et al.  Keyword searching and browsing in databases using BANKS , 2002, Proceedings 18th International Conference on Data Engineering.

[53]  Amélie Marian,et al.  Implementing Xquery 1.0: The Galax Experience , 2003, VLDB.

[54]  Morton H. Lewin Elements of C , 1986, Foundations of Computer Science.

[55]  Sihem Amer-Yahia,et al.  XML retrieval: db/ir in theory, web in practice , 2007, VLDB.

[56]  Tim Furche,et al.  Querying the standard and Semantic Web using Xcerpt and visXcerpt , 2005 .

[57]  Joris Petrus Maria Graaumans,et al.  Usability of XML Query Languages , 2001 .

[58]  Hector Garcia-Molina,et al.  Collaborative Creation of Communal Hierarchical Taxonomies in Social Tagging Systems , 2006 .

[59]  Ophir Frieder,et al.  Hourly analysis of a very large topically categorized web query log , 2004, SIGIR '04.

[60]  Soumen Chakrabarti,et al.  Dynamic personalized pagerank in entity-relation graphs , 2007, WWW '07.

[61]  S. Debowski Knowledge Management , 2005 .

[62]  Roy Goldman,et al.  DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases , 1997, VLDB.

[63]  Jacques Le Maitre,et al.  Extending xQuery with transformation operators , 2003, DocEng '03.

[64]  Judit Bar-Ilan,et al.  Structured versus unstructured tagging: a case study , 2008, Online Inf. Rev..

[65]  Sihem Amer-Yahia,et al.  Tree Pattern Relaxation , 2002, EDBT.

[66]  Thomas R. Gruber,et al.  Collective knowledge systems: Where the Social Web meets the Semantic Web , 2008, J. Web Semant..

[67]  Shaul Dar,et al.  DTL's DataSpot: Database Exploration Using Plain Language , 1998, VLDB.

[68]  Ricardo A. Baeza-Yates,et al.  Relating Web Structure, User Search Behavior , 2001, WWW Posters.

[69]  Pavel Berkhin,et al.  A Survey on PageRank Computing , 2005, Internet Math..

[70]  Axel Polleres,et al.  XSPARQL: Traveling between the XML and RDF Worlds - and Avoiding the XSLT Pilgrimage , 2008, ESWC.

[71]  Thomas Gruber,et al.  Ontology of Folksonomy: A Mash-Up of Apples and Oranges , 2007, Int. J. Semantic Web Inf. Syst..

[72]  Amanda Spink,et al.  Real life, real users, and real needs: a study and analysis of user queries on the web , 2000, Inf. Process. Manag..

[73]  Tok Wang Ling,et al.  Effective XML Keyword Search with Relevance Oriented Ranking , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[74]  John Davies,et al.  QuizRDF: search technology for the semantic Web , 2004, 37th Annual Hawaii International Conference on System Sciences, 2004. Proceedings of the.

[75]  Andreas Blumauer,et al.  KiWi - Knowledge in a Wiki , 2008, ESWC.

[76]  François Bry,et al.  Towards Reasoning and Explanations for Social Tagging , 2008, ExaCt.

[77]  Hans Tompits,et al.  Combining answer set programming with description logics for the Semantic Web , 2004, Artif. Intell..

[78]  M. Tamer Özsu,et al.  A comprehensive XQuery to SQL translation using dynamic interval encoding , 2003, SIGMOD '03.

[79]  Tiziana Catarci,et al.  Are Visual Query Languages Easier to Use than Traditional Ones? An Experimental Proof , 1996, BCS HCI.

[80]  Torsten Grust,et al.  Accelerating XPath evaluation in any RDBMS , 2004, TODS.

[81]  Jennifer Widom,et al.  The Lorel query language for semistructured data , 1997, International Journal on Digital Libraries.

[82]  James Clark,et al.  XSL Transformations (XSLT) Version 1.0 , 1999 .

[83]  Steffen Staab,et al.  Emergent Semantics Principles and Issues , 2004, DASFAA.

[84]  Fabio Crestani,et al.  Application of Spreading Activation Techniques in Information Retrieval , 1997, Artificial Intelligence Review.

[85]  Paul F. Dietz Maintaining order in a linked list , 1982, STOC '82.

[86]  Martin Hepp,et al.  Possible Ontologies: How Reality Constrains the Development of Relevant Ontologies , 2007, IEEE Internet Computing.

[87]  Ciro Cattuto,et al.  Semantic Analysis of Tag Similarity Measures in Collaborative Tagging Systems , 2008, LWA.

[88]  Divesh Srivastava,et al.  Keyword proximity search in XML trees , 2006 .

[89]  Guido Moerkotte,et al.  Evaluating queries with generalized path expressions , 1996, SIGMOD '96.

[90]  Sören Auer,et al.  OntoWiki: A Tool for Social, Semantic Collaboration , 2006, CKC.

[91]  Lars Schmidt-Thieme,et al.  Ideas and Improvements for Semantic Wikis , 2006, ESWC.

[92]  Jim Melton,et al.  An early look at XQuery API for Java#8482; (XQJ) , 2004, SGMD.

[93]  Sihem Amer-Yahia,et al.  Texquery: a full-text search extension to xquery , 2004, WWW '04.

[94]  Peter F. Patel-Schneider,et al.  OWL 2 Web Ontology Language , 2009 .

[95]  Jignesh M. Patel,et al.  Structural joins: a primitive for efficient XML query pattern matching , 2002, Proceedings 18th International Conference on Data Engineering.

[96]  Vagelis Hristidis,et al.  Keyword proximity search on XML graphs , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[97]  Laks V. S. Lakshmanan,et al.  FleXPath: flexible structure and full-text querying for XML , 2004, SIGMOD '04.

[98]  Ioana Manolescu,et al.  Integrating Keyword Search into XML Query Processing , 2000, BDA.

[99]  Jennifer Widom,et al.  Scaling personalized web search , 2003, WWW '03.

[100]  Georg Gottlob,et al.  The complexity of acyclic conjunctive queries , 2001, JACM.

[101]  Karl Aberer,et al.  Combining Pat-Trees and Signature Files for Query Evaluation in Document Databases , 1999, DEXA.

[102]  Michael Brundage XQuery: The XML Query Language , 2004 .

[103]  Gerhard Weikum,et al.  Language-model-based ranking for queries on RDF-graphs , 2009, CIKM.

[104]  Tim Furche,et al.  XcerptRDF: A Pattern-based Answer to the Versatile Web Challenge , 2008 .

[105]  Jennifer Widom,et al.  Change detection in hierarchically structured information , 1996, SIGMOD '96.

[106]  Ur Informatik,et al.  A Gentle Introduction into Xcerpt, a Rule-Based Query and Transformation Language for XML , 2002 .

[107]  Torsten. Grust,et al.  Accelerating XPath location steps , 2002, SIGMOD '02.

[108]  Tim Furche,et al.  Taming Existence in RDF Querying , 2008, RR.

[109]  Sudipto Guha,et al.  Approximate XML joins , 2002, SIGMOD '02.

[110]  Tim Berners-Lee,et al.  Linked data , 2020, Semantic Web for the Working Ontologist.

[111]  Alin Deutsch,et al.  Containment and Integrity Constraints for XPath , 2001, KRDB.

[112]  C. M. Sperberg-McQueen,et al.  Extensible Markup Language (XML) , 1997, World Wide Web J..

[113]  Elena Paslaru Bontas Simperl,et al.  Creating and using Semantic Web information with Makna , 2006, SemWiki.

[114]  Pierre Genevès,et al.  XPath Formal Semantics and Beyond: a Coq based approach , 2004 .

[115]  Michael J. Carey,et al.  The BEA streaming XQuery processor , 2004, The VLDB Journal.

[116]  Lydia B. Chilton,et al.  Tabulator: Exploring and Analyzing linked data on the Semantic Web , 2006 .

[117]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[118]  David Aumueller,et al.  Semantic authoring and retrieval within a Wiki , 2005 .