Due to the tremendous growth of internet over the past few years, a large repository of data covering almost every area has been formed over the web and as a result of which search engine users are facing a lot of problems in retrieving the most appropriate information out of it which is known as information overkill problem. The main cause of this problem is non-optimization of web pages. This paper is to present a way for investigation of transaction logs obtained by search engines to optimize rank of web pages and then resulting into the topic/subject relevant and user suitable documents at the top of the result pages of search engine. The proposed algorithm starts with query logs maintained by a search engine to get an insight into the exact information need of users. Then, a novel approach is used to find similarity among queries based on two silent features i.e. query keywords and clicked URLs. Further, query cluster making tool is used to form clusters of same kind of queries based on to the value of combined similarity measure which lies between 0 and 1. After that, a relevancy finder tool works onto the URLs associated with each query in these clusters to find their relevancy with respect to the query by eliminating the effect of black hat and several other search optimization techniques. A sorting algorithm is thus applied on each cluster to arrange all the URLs in an increasing order of their relevancy and further the sequential pattern mining algorithm is applied on them in order to find the most frequently accessed sequential pattern. The outcome of this procedure is then improved by again ranking the web pages with the help of weight calculation according to the newly discovered sequential patterns and the earlier rank associated with the web pages.
[1]
Nikita Taneja,et al.
Query Recommendation for Optimizing the Search Engine Results
,
2012
.
[2]
Ricardo Baeza-Yates,et al.
Web Usage Mining in Search Engines
,
2005
.
[3]
Kazutoshi Sumiya,et al.
Extracting and Clustering Related Keywords based on History of Query Frequency
,
2008,
2008 Second International Symposium on Universal Communication.
[4]
Ji-Rong Wen,et al.
Clustering user queries of a search engine
,
2001,
WWW '01.
[5]
Doug Beeferman,et al.
Agglomerative clustering of a search engine query log
,
2000,
KDD '00.
[6]
Nivio Ziviani,et al.
Using association rules to discover search engines related queries
,
2003,
Proceedings of the IEEE/LEOS 3rd International Conference on Numerical Simulation of Semiconductor Optoelectronic Devices (IEEE Cat. No.03EX726).