Learning Semantic Query Suggestions

An important application of semantic web technology is recognizing human-defined concepts in text. Query transformation is a strategy often used in search engines to derive queries that are able to return more useful search results than the original query and most popular search engines provide facilities that let users complete, specify, or reformulate their queries. We study the problem of semantic query suggestion , a special type of query transformation based on identifying semantic concepts contained in user queries. We use a feature-based approach in conjunction with supervised machine learning, augmenting term-based features with search history-based and concept-specific features. We apply our method to the task of linking queries from real-world query logs (the transaction logs of the Netherlands Institute for Sound and Vision) to the DBpedia knowledge base. We evaluate the utility of different machine learning algorithms, features, and feature types in identifying semantic concepts using a manually developed test bed and show significant improvements over an already high baseline. The resources developed for this paper, i.e., queries, human assessments, and extracted features, are available for download.

[1]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[2]  金田 重郎,et al.  C4.5: Programs for Machine Learning (書評) , 1995 .

[3]  Kenneth Ward Church,et al.  Inverse Document Frequency (IDF): A Measure of Deviations from Poisson , 1995, VLC@ACL.

[4]  Gobinda G. Chowdhury,et al.  Spinning the Semantic Web: Bringing the World Wide Web to Its Full Potential , 2004 .

[5]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[6]  Edgar Meij,et al.  Investigating the demand side of semantic search through query log analysis , 2009 .

[7]  Jens Lehmann,et al.  What Have Innsbruck and Leipzig in Common? Extracting Semantics from Wiki Content , 2007, ESWC.

[8]  Padhraic Smyth,et al.  Modeling Documents by Combining Semantic Concepts with Unsupervised Statistical Learning , 2008, SEMWEB.

[9]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[10]  Michel C. A. Klein,et al.  Matching Unstructured Vocabularies Using a Background Ontology , 2006, EKAW.

[11]  Jeffrey M. Bradshaw,et al.  Applying KAoS Services to Ensure Policy Compliance for Semantic Web Services Workflow Composition and Enactment , 2004, SEMWEB.

[12]  Steffen Staab,et al.  The Semantic Web - ISWC 2008, 7th International Semantic Web Conference, ISWC 2008, Karlsruhe, Germany, October 26-30, 2008. Proceedings , 2008, SEMWEB.

[13]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[14]  M. de Rijke,et al.  Information Retrieval Support for Ontology Construction and Use , 2004, SEMWEB.

[15]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[16]  Amanda Spink,et al.  Searching for multimedia: analysis of audio, video and image Web queries , 2000, World Wide Web.

[17]  Jérôme Euzenat,et al.  A Survey of Schema-Based Matching Approaches , 2005, J. Data Semant..

[18]  Carl Bedingfield Review of "Spinning the semantic web: Bringing the world wide web to its full potential" edited by Dieter Fensel, James Hendler, Henry Lieberman, and Wolfgang Wahlster, The MIT press , 2003, UBIQ.

[19]  Ophir Frieder,et al.  Hourly analysis of a very large topically categorized web query log , 2004, SIGIR '04.

[20]  B. Schölkopf,et al.  Advances in kernel methods: support vector learning , 1999 .

[21]  Ian H. Witten,et al.  Learning to link with wikipedia , 2008, CIKM '08.

[22]  Ian Witten,et al.  Data Mining , 2000 .

[23]  CHENGXIANG ZHAI,et al.  A study of smoothing methods for language models applied to information retrieval , 2004, TOIS.

[24]  Dunja Mladenic,et al.  Extracting Named Entities and Relating Them over Time Based on Wikipedia , 2007, Informatica.

[25]  W. Bruce Croft,et al.  Discovering key concepts in verbose queries , 2008, SIGIR '08.

[26]  Amanda Spink,et al.  Real life, real users, and real needs: a study and analysis of user queries on the web , 2000, Inf. Process. Manag..

[27]  Steffen Staab,et al.  Managing Knowledge in a World of Networks, 15th International Conference, EKAW 2006, Podebrady, Czech Republic, October 2-6, 2006, Proceedings , 2006, EKAW.

[28]  Gwenn Englebienne,et al.  Learning Concept Mappings from Instance Similarity , 2008, SEMWEB.

[29]  Amanda Spink,et al.  Defining a session on Web search engines , 2007, J. Assoc. Inf. Sci. Technol..

[30]  Djoerd Hiemstra,et al.  Using language models for information retrieval , 2001 .

[31]  Amanda Spink,et al.  Defining a session on Web search engines: Research Articles , 2007 .

[32]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[33]  Gilad Mishne,et al.  A Study of Blog Search , 2006, ECIR.

[34]  W. Bruce Croft,et al.  Analysis of long queries in a large scale search log , 2009, WSCD '09.

[35]  Rada Mihalcea,et al.  Wikify!: linking documents to encyclopedic knowledge , 2007, CIKM '07.

[36]  Enrico Motta,et al.  The Semantic Web - ISWC 2005, 4th International Semantic Web Conference, ISWC 2005, Galway, Ireland, November 6-10, 2005, Proceedings , 2005, SEMWEB.

[37]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[38]  Hang Li,et al.  Named entity recognition in query , 2009, SIGIR.

[39]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[40]  W. Bruce Croft,et al.  Query performance prediction in web search environments , 2007, SIGIR.

[41]  Atanas Kiryakov,et al.  Semantic Annotation, Indexing, and Retrieval , 2003, SEMWEB.

[42]  Amanda Spink,et al.  From E-Sex to E-Commerce: Web Search Changes , 2002, Computer.

[43]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[44]  Stefanos D. Kollias,et al.  A String Metric for Ontology Alignment , 2005, SEMWEB.

[45]  Peter Ingwersen,et al.  Developing a Test Collection for the Evaluation of Integrated Search , 2010, ECIR.

[46]  Deborah L. McGuinness,et al.  Ontologies Come of Age , 2003, Spinning the Semantic Web.

[47]  James A. Hendler,et al.  Spinning the Semantic Web: Bringing the World Wide Web to Its Full Potential , 2002 .