A search result ranking algorithm based on web pages and tags clustering

With the rapid development of network techniques, huge information resources glut the whole web world. Locating useful information effectively from the World Wide Web (WWW) is of wide interest. The huge volume of the return results makes the user only focus on the top results. So the ranking problem becomes the important task for the search systems. Now, so many ranking algorithms were proposed. But the existing search systems just use the relationships or links between user query and web pages in the traditional Web1.0 environment. Now, the success and popularity of social network systems, such as del.icio.us, Facebook, etc., have generated many interesting problems to the research community. This gives us a new viewpoint on how to improve the quality of information retrieval. This paper firstly summarizes the present ranking algorithms, and analyses their merits and demerits. Secondly, we present a new search ranking algorithm bases on web pages and tags clustering, and use several evaluating methods to assess and contrast with Google.

[1]  Bernardo A. Huberman,et al.  Usage patterns of collaborative tagging systems , 2006, J. Inf. Sci..

[2]  James A. Reggia,et al.  Connectionist models and information retrieval , 1990 .

[3]  Yiqun Liu,et al.  Automatic Query Type Identification Based on Click Through Information , 2006, AIRS.

[4]  Marcus Fontoura,et al.  Using annotations in enterprise search , 2006, WWW '06.

[5]  Dell Zhang,et al.  Semantic, Hierarchical, Online Clustering of Web Search Results , 2004, APWeb.

[6]  Yong Yu,et al.  Exploring folksonomy for personalized search , 2008, SIGIR '08.

[7]  Daniel Sunday,et al.  A very fast substring search algorithm , 1990, CACM.

[8]  Yong Yu,et al.  Exploring social annotations for the semantic web , 2006, WWW '06.

[9]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[10]  Wei-Ying Ma,et al.  Block-level link analysis , 2004, SIGIR '04.

[11]  Gerard Salton,et al.  Generation and search of clustered files , 1978, TODS.

[12]  Yiqun Liu,et al.  Automatic search engine performance evaluation with click-through data analysis , 2007, WWW '07.

[13]  Andrei Broder,et al.  A taxonomy of web search , 2002, SIGF.

[14]  Zhiqiang Zhang,et al.  A New Keywords Method to Improve Web Search , 2010, 2010 IEEE 12th International Conference on High Performance Computing and Communications (HPCC).

[15]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[16]  Weiyi Meng,et al.  Using the Structure of HTML Documents to Improve Retrieval , 1997, USENIX Symposium on Internet Technologies and Systems.

[17]  Alistair Moffat,et al.  An Efficient Indexing Technique for Full Text Databases , 1992, Very Large Data Bases Conference.

[18]  Jaime Teevan,et al.  Implicit feedback for inferring user preference: a bibliography , 2003, SIGF.

[19]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[20]  Christoph Meinel,et al.  Web Search Personalization Via Social Bookmarking and Tagging , 2007, ISWC/ASWC.

[21]  Christos Faloutsos,et al.  Design of a Signature File Method that Accounts for Non-Uniform Occurrence and Query Frequencies , 1985, VLDB.

[22]  Ashwin Ram Interest-based information filtering and extraction in natural language understanding systems , 1991 .

[23]  James R. Driscoll,et al.  Incorporating a semantic analysis into a document retrieval strategy , 1991, SIGIR '91.

[24]  Ron Sacks-Davis,et al.  An e cient indexing technique for full-text database systems , 1992, VLDB 1992.

[25]  Susan T. Dumais,et al.  Learning user interaction models for predicting web search result preferences , 2006, SIGIR.

[26]  Weiyi Meng,et al.  A new study on using HTML structures to improve retrieval , 1999, Proceedings 11th International Conference on Tools with Artificial Intelligence.

[27]  Bernhard Schölkopf,et al.  Learning with Hypergraphs: Clustering, Classification, and Embedding , 2006, NIPS.

[28]  Eugene Agichtein,et al.  Identifying "best bet" web search results by mining past user behavior , 2006, KDD '06.

[29]  W. Bruce Croft A model of cluster searching bases on classification , 1980, Inf. Syst..

[30]  Dik Lun Lee,et al.  Partitioned signature files: design issues and performance evaluation , 1989, TOIS.

[31]  Wei-Ying Ma,et al.  Block-based web search , 2004, SIGIR '04.

[32]  Uwe Deppisch,et al.  S-tree: a dynamic balanced signature index for office retrieval , 1986, SIGIR '86.