Algorithmic Challenges in Web Search Engines

We present the main algorithmic challenges that large Web search engines face today. These challenges are present in all the modules of a Web retrieval system, ranging from the gathering of the data to be indexed (crawling) to the selection and ordering of the answers to a query (searching and ranking). Most of the challenges are ultimately related to the quality of the answer or the efficiency in obtaining it, although some are relevant even to the existence of current search engines: context based advertising. As the Web grows and changes at a fast pace, the algorithms behind these challenges must rely in large scale experimentation, both in data volume and computation time, to understand the main issues that affect them. We show examples of our own research and of the state of the art. The full version of this paper appears in [1] .

[1]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[2]  B. Wellman Computer Networks As Social Networks , 2001, Science.

[3]  Ricardo A. Baeza-Yates,et al.  Information retrieval in the Web: beyond current search engines , 2003, Int. J. Approx. Reason..

[4]  Soumen Chakrabarti,et al.  Mining the web - discovering knowledge from hypertext data , 2002 .

[5]  Gonzalo Navarro,et al.  Compressed full-text indexes , 2007, CSUR.

[6]  Ricardo A. Baeza-Yates,et al.  Crawling a country: better strategies than breadth-first for web page ordering , 2005, WWW '05.

[7]  Jon M. Kleinberg,et al.  Query incentive networks , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[8]  Berthier A. Ribeiro-Neto,et al.  Impedance coupling in content-targeted advertising , 2005, SIGIR '05.

[9]  Ricardo A. Baeza-Yates,et al.  WIM: an information mining model for the Web , 2005, Third Latin American Web Congress (LA-WEB'2005).

[10]  Ricardo A. Baeza-Yates,et al.  A Fast Set Intersection Algorithm for Sorted Sequences , 2004, CPM.

[11]  Ricardo A. Baeza-Yates,et al.  Query Recommendation Using Query Logs in Search Engines , 2004, EDBT Workshops.

[12]  Ricardo A. Baeza-Yates,et al.  A Website Mining Model Centered on User Queries , 2005, EWMF/KDO.

[13]  Hemant K. Bhargava,et al.  Paid placement strategies for internet search engines , 2002, WWW '02.

[14]  Scott Nicholson,et al.  How much of it is real? Analysis of paid placement in Web search engine results , 2006 .

[15]  Ricardo A. Baeza-Yates,et al.  Applications of Web Query Mining , 2005, ECIR.