Developing Intelligent Search Engines

Developers of search engines today do not only face technical problems such as designing an efficient crawler or distributing search requests among servers. Search has become a problem of identifying reliable information in an adversarial environment. Since the web is used for purposes as diverse as trade, communication, and advertisement search engines need to be able to distinguish different types of web pages. In this paper we describe some common properties of the WWW and social networks. We show one possibility of exploiting these properties for classifying web pages.

[1]  W. Scott Spangler,et al.  Clustering hypertext with applications to web searching , 2000, HYPERTEXT '00.

[2]  Ricardo A. Baeza-Yates,et al.  Pagerank Increase under Different Collusion Topologies , 2005, AIRWeb.

[3]  Lev S. Tsimring,et al.  Modeling of contact tracing in social networks , 2003 .

[4]  András A. Benczúr,et al.  SpamRank -- Fully Automatic Link Spam Detection , 2005, AIRWeb.

[5]  Andrei Z. Broder,et al.  Graph structure in the Web , 2000, Comput. Networks.

[6]  Lise Getoor,et al.  Link-Based Classification , 2003, Encyclopedia of Machine Learning and Data Mining.

[7]  Tobias Scheffer,et al.  Thwarting the Nigritude Ultramarine: Learning to Identify Link Spam , 2005, ECML.

[8]  Masaru Kitsuregawa,et al.  Link Based Clustering of Web Search Results , 2001, WAIM.

[9]  Lada A. Adamic The Small World Web , 1999, ECDL.

[10]  Brian D. Davison,et al.  Identifying link farm spam pages , 2005, WWW '05.

[11]  Thorsten Joachims,et al.  Accurately Interpreting Clickthrough Data as Implicit Feedback , 2017 .

[12]  Jennifer Neville,et al.  Why collective inference improves relational classification , 2004, KDD.

[13]  Malik Magdon-Ismail,et al.  Optimal Link Bombs are Uncoordinated , 2005, AIRWeb.

[14]  Sharon L. Milgram,et al.  The Small World Problem , 1967 .

[15]  James Allan,et al.  Introduction to topic detection and tracking , 2002 .

[16]  Mitsuru Ishizuka,et al.  Discovery of Emerging Topics between Communities on WWW , 2001, Web Intelligence.

[17]  Marc Najork,et al.  Spam, damn spam, and statistics: using statistical analysis to locate spam web pages , 2004, WebDB '04.

[18]  Rajeev Motwani,et al.  Stratified Planning , 2009, IJCAI.

[19]  Pedro M. Domingos,et al.  Adversarial classification , 2004, KDD.

[20]  M E J Newman Assortative mixing in networks. , 2002, Physical review letters.

[21]  M. Newman,et al.  Why social networks are different from other types of networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[22]  Marc Najork,et al.  A large‐scale study of the evolution of Web pages , 2004, Softw. Pract. Exp..

[23]  Krishna Bharat,et al.  Who links to whom: mining linkage between Web sites , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[24]  Hector Garcia-Molina,et al.  Web Spam Taxonomy , 2005, AIRWeb.

[25]  Brian D. Davison Recognizing Nepotistic Links on the Web , 2000 .