Thebackgroundof this article is the issueof how tooverview theknowledgeof a givenquery keyword.Especially,theauthorsfocusonconcernsofthosewhosearchforwebpageswithagiven querykeyword.TheWebsearchinformationneedsofagivenquerykeywordiscollectedthrough searchenginesuggests.Givenaquerykeyword,theauthorscollectuptoaround1,000suggests, whilemanyofthemareredundant.Theyclassifyredundantsearchenginesuggestsbasedonatopic model.However,onelimitationofthetopicmodelbasedclassificationofsearchenginesuggestsis thatthegranularityofthetopics,i.e.,theclustersofsearchenginesuggests,istoocoarse.Inorder toovercometheproblemofthecoarse-grainedclassificationofsearchenginesuggests,thisarticle furtherappliesthewordembeddingtechniquetothewebpagesusedduringthetrainingofthetopic model,inadditiontothetextdataofthewholeJapaneseversionofWikipedia.Then,theauthors examinethewordembeddingbasedsimilaritybetweensearchenginessuggestsandfurtherclassify searchenginesuggestswithinasingletopicintofiner-grainedsubtopicsbasedonthesimilarityof wordembeddings.Evaluationresultsprovethattheproposedapproachperformswellinthetaskof subtopicclassificationofsearchenginesuggests. KEyWoRdS Clustering, Overview, Search Engine Suggest, Topic Model, Word Embeddings
[1]
Michael R. Lyu,et al.
Learning latent semantic relations from clickthrough data for query suggestion
,
2008,
CIKM '08.
[2]
Kenneth Wai-Ting Leung,et al.
Personalized Concept-Based Clustering of Search Engine Queries
,
2008,
IEEE Transactions on Knowledge and Data Engineering.
[3]
Xueqi Cheng,et al.
A structured approach to query recommendation with social annotation data
,
2010,
CIKM.
[4]
Jeffrey Dean,et al.
Distributed Representations of Words and Phrases and their Compositionality
,
2013,
NIPS.
[5]
Michael I. Jordan,et al.
Latent Dirichlet Allocation
,
2001,
J. Mach. Learn. Res..