Optimal Robot Scheduling for Web Search Engines

A robot is deployed by a Web search engine in order to maintain the currency of its data base of Web pages. This paper studies robot scheduling policies that minimize the fractions $r_i$ of time pages spend out-of-date, assuming independent Poisson page-change processes, and a general distribution for the page access time $X$. We show that, if $X$ is decreased in the increasing-convex ordering sense, then $r_i$ is decreased for all $i$ under any scheduling policy, and that, in order to minimize expected total obsolescence time of any page, the accesses to that page should be as evenly spaced in time as possible. We then investigate the problem of scheduling to minimize the cost function $\sum c_i r_i,$ where the $c_i$ are given weights proportional to the page-change rates $\mu_i$. We give a tight bound on the performance of such a policy and prove that the optimal frequency at which the robot should access page $i$ is proportional to $\ln (h_i)^{-1}$, where $h_i := {\rm E}e^{-\mu_iX}.$ Note that this reduces to being proportional to $\mu_i$ when $X$ is a constant, but not, as one might expect, when $X$ has a general distribution. Next, we evaluate randomized accessing policies whereby the choices of page access are determined by independent random samples from the distribution ${f_i}$. We show that when the weights $c_i$ in the cost function are proportional to $\mu_i$, the minimum cost is achieved when $f_i$ is proportional to $(h_i)^{-1} - 1$. Finally, we present and analyze a heuristic policy that is especially suited to the asymptotic regime of large data bases.

[1]  Mostafa H. Ammar,et al.  Scheduling Algorithms for Videotex Systems Under Broadcast Delivery , 1986, ICC.

[2]  J. Kingman,et al.  The Ergodic Theory of Subadditive Stochastic Processes , 1968 .

[3]  Hanoch Levy,et al.  Efficient visit frequencies for polling tables: minimization of waiting cost , 1991, Queueing Syst. Theory Appl..

[4]  Micha Hofri,et al.  Packet delay under the golden ratio weighted TDM policy in a multiple-access channel , 1987, IEEE Trans. Inf. Theory.

[5]  I. Olkin,et al.  Inequalities: Theory of Majorization and Its Applications , 1980 .

[6]  Philippe Nain,et al.  Optimal scheduling in some multiqueue single-server systems , 1992 .

[7]  Philippe Nain,et al.  Optimal scheduling in some multi-queue single-server systems , 1990, Proceedings. IEEE INFOCOM '90: Ninth Annual Joint Conference of the IEEE Computer and Communications Societies@m_The Multiple Facets of Integration.

[8]  高木 英明,et al.  Analysis of polling systems , 1986 .

[9]  Sem C. Borst,et al.  Optimization of fixed time polling schemes , 1994, Telecommun. Syst..

[10]  Hideaki Takagi,et al.  Queuing analysis of polling models , 1988, CSUR.

[11]  Alon Itai,et al.  A golden ratio control policy for a multiple-access channel , 1984 .

[12]  Bruce E. Hajek,et al.  Extremal Splittings of Point Processes , 1985, Math. Oper. Res..

[13]  Yair Arian,et al.  Algorithms for generalized round robin routing , 1992, Oper. Res. Lett..

[14]  N. L. Lawrie,et al.  Comparison Methods for Queues and Other Stochastic Models , 1984 .

[15]  Hanoch Levy,et al.  Efficient Visit Orders for Polling Systems , 1993, Perform. Evaluation.

[16]  Mostafa H. Ammar,et al.  On the optimality of cyclic transmission in teletext systems , 1985, 1985 24th IEEE Conference on Decision and Control.

[17]  Anja Feldmann,et al.  Rate of Change and other Metrics: a Live Study of the World Wide Web , 1997, USENIX Symposium on Internet Technologies and Systems.