A Method of Subtopic Classification of Search Engine Suggests by Integrating a Topic Model and Word Embeddings

Thebackgroundof this article is the issueof how tooverview theknowledgeof a givenquery keyword.Especially,theauthorsfocusonconcernsofthosewhosearchforwebpageswithagiven querykeyword.TheWebsearchinformationneedsofagivenquerykeywordiscollectedthrough searchenginesuggests.Givenaquerykeyword,theauthorscollectuptoaround1,000suggests, whilemanyofthemareredundant.Theyclassifyredundantsearchenginesuggestsbasedonatopic model.However,onelimitationofthetopicmodelbasedclassificationofsearchenginesuggestsis thatthegranularityofthetopics,i.e.,theclustersofsearchenginesuggests,istoocoarse.Inorder toovercometheproblemofthecoarse-grainedclassificationofsearchenginesuggests,thisarticle furtherappliesthewordembeddingtechniquetothewebpagesusedduringthetrainingofthetopic model,inadditiontothetextdataofthewholeJapaneseversionofWikipedia.Then,theauthors examinethewordembeddingbasedsimilaritybetweensearchenginessuggestsandfurtherclassify searchenginesuggestswithinasingletopicintofiner-grainedsubtopicsbasedonthesimilarityof wordembeddings.Evaluationresultsprovethattheproposedapproachperformswellinthetaskof subtopicclassificationofsearchenginesuggests. KEyWoRdS Clustering, Overview, Search Engine Suggest, Topic Model, Word Embeddings