Enabling Time Sensitive Information Retrieval on the Web through Real Time Search Engines Using Streams

Real time search engines constantly index web content originated by data streams also. This is because, the web sources like social networking sites, news, and tweets provide up to date information through streams. As new content is arrived constantly from those sources, it is very challenging job for search engines to have efficient indexing mechanisms to ensure index freshness and coverage of the index. Such updated index supports faster search whose results also include the latest content available. Latencies such as retrieval latency and indexing latency play an important role in index freshness. The former is the time taken to fetch the content after its publication while the latter is the time taken to make index on the newly fetched content. This paper presents a framework which optimizes indexing latency and also indexing coverage. The empirical results revealed that the proposed framework is capable of achieving index freshness and coverage in order to support faster processing of search queries.

[1]  Gongzhu Hu,et al.  A distributed platform for archiving and retrieving RSS feeds , 2005, Fourth Annual ACIS International Conference on Computer and Information Science (ICIS'05).

[2]  Hyun-Kyu Cho,et al.  Efficient Monitoring Algorithm for Fast News Alerts , 2007, IEEE Transactions on Knowledge and Data Engineering.

[3]  Peter Saint-Andre Extensible Messaging and Presence Protocol (XMPP): Core , 2011, RFC.

[4]  Hector Garcia-Molina,et al.  Effective page refresh policies for Web crawlers , 2003, TODS.

[5]  Banu Yüksel Özkaya,et al.  Analysis of the (s, S) policy for perishables with a random shelf life , 2008 .

[6]  Filippo Menczer,et al.  Crawling the Web , 2004, Web Dynamics.

[7]  Bernard J. Jansen,et al.  Real time search user behavior , 2010, CHI EA '10.

[8]  David Geer Is It Really Time for Real-Time Search? , 2010, Computer.

[9]  Marc Najork,et al.  Mercator: A scalable, extensible Web crawler , 1999, World Wide Web.

[10]  Vasileios Kandylas,et al.  Improving web search relevance and freshness with content previews , 2010, CIKM.

[11]  Jenny Edwards,et al.  An adaptive model for optimizing performance of an incremental web crawler , 2001, WWW '01.

[12]  Luis Enrique Sánchez,et al.  Information and Knowledge Management , 2012 .

[13]  Sriram Raghavan,et al.  Searching the Web , 2001, ACM Trans. Internet Techn..

[14]  Zhen Liu,et al.  Optimal Robot Scheduling for Web Search Engines , 1998 .

[15]  Sandeep Pandey,et al.  User-centric Web crawling , 2005, WWW '05.

[16]  Philip S. Yu,et al.  Optimal crawling strategies for web search engines , 2002, WWW '02.