论文信息 - DeDuSERP: De-duplication in search engine result page

DeDuSERP: De-duplication in search engine result page

Web offers a new way of service provision by arranging different resources over the web. The most critical and prominent is web searches. The purpose of this research is to identify a subtype of De-Duplication. DeDuSERP is de-duplication in search engine result page. It restricts the showcasing of urls with duplicate or similar data and hence enhances the search result experience of any client. By duplicate results we mean different links containing the same content or information. To solve this problem, we have designed a filter between Search engine result page and indexed-ranked pages which we get from the search engine in response to the query of the searcher. This filter eliminates the duplicate links idiosyncratically and displays the unique results on the SERP for the searcher. We have performed the string to string comparison of web pages and if the content is 90% similar then we adjudge them as duplicates and then check their inventiveness of these duplicate links on the basis of timestamp. By this we mean then the web page crawled earlier is original. The process of comparison and timestamp matching is done using an open source apache API Commons IO 2.4.

Priti Dimri | Naresh Sharma

[1] Veerendra,et al. Hybrid Cloud Approach for Secure Authorized Deduplication , 2015 .

[2] K. Srividhya,et al. An Android based secure access control using ARM and cloud computing , 2015, 2015 2nd International Conference on Electronics and Communication Systems (ICECS).

[3] Rajeev Motwani,et al. The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[4] Sergey Brin,et al. The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[5] Shmuel Tomi Klein,et al. The design of a similarity based deduplication system , 2009, SYSTOR '09.

[6] Alessandro Sorniotti,et al. A Secure Data Deduplication Scheme for Cloud Storage , 2014, Financial Cryptography.