Effective Mining Approach to Produce Quality Search Results Using Proposed Approach

Web mining is an application of data mining techniques to extract and process knowledge from sources such as web documents, hyperlinks, website usage logs etc. Due to the outbreak of extensive information through web resources, mining of semantically relevant data for a search keyword is one of the most intriguing studies to research. This work deals with automatically extracting the information from web documents using web content mining. The extracted data needs to be preprocessed in order to obtain the appropriate data format for further analysis. Generally, the content based search algorithm is used to find the items relevant to the keyword searched resulting in an indexed set of similar results. Further, the clustering of the similar data is done by adopting the quality threshold clustering algorithm assigning a similarity index to each of the result items. For the final list of items obtained, the weighted page ranking algorithm is applied to rank the most frequently searched item in the lists. The proposed work efficiency will be determined by the cluster's quality and the query blocks ranking efficiency. Various metrics like cluster purity, NMI, Rand Index, F-Measure, wPRF are used to evaluate the quality and the ranking efficiency of the search result obtained. Duplicate result sets are handled and are castigated for better unambiguous results. The results obtained is proved to be better and to overrun the existing approaches like QFI and QFJ. The quality of the result set obtained is further evaluated by repeating the process considering only the top n ranked items, shuffling the top items or by randomly selecting the items. Thus, enabling to validate and uphold the results of the proposed work surpassing the existing algorithms.

[1]  Krisztian Balog,et al.  Entity search: building bridges between two worlds , 2010, SEMSEARCH '10.

[2]  Panayiotis Tsaparas,et al.  Facet discovery for structured web search: a query-log mining approach , 2011, SIGMOD '11.

[3]  S. A. Babar,et al.  Improving Performance of Text Summarization , 2015 .

[4]  Truica Ciprian-Octavian,et al.  Comparing Different Term Weighting Schemas for Topic Modeling , 2016 .

[5]  T.Shanmugapriya,et al.  Robotics and the Brain-Computer InterfaceSystem: Critical Review for ManufacturingApplication , 2014 .

[6]  Shuming Shi,et al.  Employing Topic Models for Pattern-based Semantic Class Discovery , 2009, ACL/IJCNLP.

[7]  A. Jain,et al.  Page Ranking Algorithms in Web Mining, Limitations of Existing Methods and a New Method for Indexing Web Pages , 2013, 2013 International Conference on Communication Systems and Network Technologies.

[9]  Guoming Tang,et al.  A Data-Centric Approach to Quality Estimation of Role Mining Results , 2016, IEEE Transactions on Information Forensics and Security.

[10]  Zhenglu Yang,et al.  QUBiC: An adaptive approach to query-based recommendation , 2013, Journal of Intelligent Information Systems.

[11]  Kevin Chen-Chuan Chang,et al.  Supporting entity search: a large-scale prototype search engine , 2007, SIGMOD '07.

[12]  Lidong Bing,et al.  Web Query Reformulation via Joint Modeling of Latent Topic Dependency and Term Context , 2015, TOIS.

[13]  Ana Casali,et al.  An Assistant to Populate Repositories: Gathering Educational Digital Objects and Metadata Extraction , 2016, IEEE Revista Iberoamericana de Tecnologias del Aprendizaje.

[14]  Dhananjay S. Rakshe Page Ranking Algorithms In web Mining-A Brief Survey , 2016 .

[15]  S. Viswanadha Raju,et al.  Data labeling method based on cluster purity using relative rough entropy for categorical data clustering , 2013, 2013 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[16]  Shiwei Zhu,et al.  Implication intensity: Randomized F-measure for cluster evaluation , 2009, 2009 6th International Conference on Service Systems and Service Management.

[17]  James Allan,et al.  Extracting query facets from search results , 2013, SIGIR.

[18]  Sougata Mukherjea,et al.  Faceted search and browsing of audio content on spoken web , 2010, CIKM.

[19]  Qin Liu,et al.  Improved Normalized Mutual Information Feature Selection Method in Software Cost Estimation , 2013 .

[20]  Ashutosh Dixit,et al.  Comparative Study of Page Rank and Weighted Page Rank Algorithm , 2014 .