Using the wisdom of the crowds for keyword generation

In the sponsored search model, search engines are paid by businesses that are interested in displaying ads for their site alongside the search results. Businesses bid for keywords, and their ad is displayed when the keyword is queried to the search engine. An important problem in this process is 'keyword generation': given a business that is interested in launching a campaign, suggest keywords that are related to that campaign. We address this problem by making use of the query logs of the search engine. We identify queries related to a campaign by exploiting the associations between queries and URLs as they are captured by the user's clicks. These queries form good keyword suggestions since they capture the "wisdom of the crowd" as to what is related to a site. We formulate the problem as a semi-supervised learning problem, and propose algorithms within the Markov Random Field model. We perform experiments with real query logs, and we demonstrate that our algorithms scale to large query logs and produce meaningful results.

[1]  Nick Craswell,et al.  Random walks on the click graph , 2007, SIGIR.

[2]  Yiming Yang,et al.  An Evaluation of Statistical Approaches to Text Categorization , 1999, Information Retrieval.

[3]  Stan Z. Li,et al.  Markov Random Field Modeling in Computer Vision , 1995, Computer Science Workbench.

[4]  Carl Gutwin,et al.  Domain-Specific Keyphrase Extraction , 1999, IJCAI.

[5]  N. Biggs RANDOM WALKS AND ELECTRICAL NETWORKS (Carus Mathematical Monographs 22) , 1987 .

[6]  Gerard Salton,et al.  The SMART Retrieval System , 1971 .

[7]  Vibhanshu Abhishek,et al.  Keyword generation for search engine advertising using semantic similarity between terms , 2007, ICEC.

[8]  Qiang Yang,et al.  Q2C@UST: our winning solution to query classification in KDDCUP 2005 , 2005, SKDD.

[9]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[10]  Saturnino Luz,et al.  Automatic Hypertext Keyphrase Detection , 2005, IJCAI.

[11]  Andrei Z. Broder,et al.  Robust classification of rare queries using web knowledge , 2007, SIGIR.

[12]  Ricardo A. Baeza-Yates,et al.  Extracting semantic relations from query logs , 2007, KDD '07.

[13]  Anette Hulth,et al.  Improved Automatic Keyword Extraction Given More Linguistic Knowledge , 2003, EMNLP.

[14]  Rajeev Motwani,et al.  Keyword Generation for Search Engine Advertising , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[15]  Vijay Murthi,et al.  Logistic Regression and Collaborative Filtering for Sponsored Search Term Recommendation , 2006 .

[16]  Bernhard Schölkopf,et al.  Introduction to Semi-Supervised Learning , 2006, Semi-Supervised Learning.

[17]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[18]  Joshua Goodman,et al.  Finding advertising keywords on web pages , 2006, WWW '06.

[19]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[20]  Wei-Ying Ma,et al.  Optimizing web search using web click-through data , 2004, CIKM '04.

[21]  Kevin J. Lang,et al.  Communities from seed sets , 2006, WWW '06.

[22]  Qiang Yang,et al.  Reinforcing Web-object Categorization Through Interrelationships , 2006, Data Mining and Knowledge Discovery.

[23]  Ji-Rong Wen,et al.  Clustering user queries of a search engine , 2001, WWW '01.

[24]  Doug Beeferman,et al.  Agglomerative clustering of a search engine query log , 2000, KDD '00.

[25]  Hector Garcia-Molina,et al.  Combating Web Spam with TrustRank , 2004, VLDB.

[26]  Peter D. Turney Coherent Keyphrase Extraction via Web Mining , 2003, IJCAI.

[27]  Peter D. Turney Learning Algorithms for Keyphrase Extraction , 2000, Information Retrieval.