Learning automata-based solutions to the optimal web polling problem modelled as a nonlinear fractional knapsack problem

We consider the problem of polling web pages as a strategy for monitoring the world wide web. The problem consists of repeatedly polling a selection of web pages so that changes that occur over time are detected. In particular, we consider the case where we are constrained to poll a maximum number of web pages per unit of time, and this constraint is typically dictated by the governing communication bandwidth, and by the speed limitations associated with the processing. Since only a fraction of the web pages can be polled within a given unit of time, the issue at stake is one of determining which web pages are to be polled, and we attempt to do it in a manner that maximizes the number of changes detected. We solve the problem by first modelling it as a stochastic nonlinear fractional knapsack problem. We then present an online learning automata (LA) system, namely, the hierarchy of twofold resource allocation automata (H-TRAA), whose primitive component is a twofold resource allocation automaton (TRAA). Both the TRAA and the H-TRAA have been proven to be asymptotically optimal. Finally, we demonstrate empirically that the H-TRAA provides orders of magnitude faster convergence compared to the learning automata knapsack game (LAKG) which represents the state-of-the-art for this problem. Further, in contrast to the LAKG, the H-TRAA scales sub-linearly. Based on these results, we believe that the H-TRAA has also tremendous potential to handle demanding real-world applications, particularly those which deal with the world wide web.

[1]  B. John Oommen,et al.  Solving Stochastic Nonlinear Resource Allocation Problems Using a Hierarchy of Twofold Resource Allocation Automata , 2010, IEEE Transactions on Computers.

[2]  Bennett Fox,et al.  Discrete Optimization Via Marginal Analysis , 1966 .

[3]  Keith W. Ross,et al.  The stochastic knapsack problem , 1989, IEEE Trans. Commun..

[4]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .

[5]  Bala Shetty,et al.  The nonlinear knapsack problem - algorithms and applications , 2002, Eur. J. Oper. Res..

[6]  B. Johnoommen Absorbing and Ergodic Discretized Two-Action Learning Automata , 1986, IEEE Transactions on Systems, Man, and Cybernetics.

[7]  Carlos Castillo,et al.  Effective web crawling , 2005, SIGF.

[8]  B. John Oommen,et al.  On Allocating Limited Sampling Resources Using a Learning Automata-based Solution to the Fractional Knapsack Problem , 2006, Intelligent Information Systems.

[9]  Jan Vondrák,et al.  Approximating the stochastic knapsack problem: the benefit of adaptivity , 2004, 45th Annual IEEE Symposium on Foundations of Computer Science.

[10]  B. John Oommen,et al.  Learning Automata-Based Solutions to the Nonlinear Fractional Knapsack Problem With Applications to Optimal Resource Allocation , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[11]  B.J. Oommen,et al.  Determining Optimal Polling Frequency Using a Learning Automata-based Solution to the Fractional Knapsack Problem , 2006, 2006 IEEE Conference on Cybernetics and Intelligent Systems.

[12]  B. John Oommen,et al.  Stochastic searching on the line and its applications to parameter learning in nonlinear optimization , 1997, IEEE Trans. Syst. Man Cybern. Part B.

[13]  Philip S. Yu,et al.  Optimal crawling strategies for web search engines , 2002, WWW '02.

[14]  Kumpati S. Narendra,et al.  Learning automata - an introduction , 1989 .

[15]  E. Steinberg,et al.  A Preference Order Dynamic Program for a Knapsack Problem with Stochastic Rewards , 1979 .

[16]  Sandeep Pandey,et al.  Monitoring the dynamic web to respond to continuous queries , 2003, WWW '03.

[17]  Toshihide Ibaraki,et al.  Fractional knapsack problems , 1977, Math. Program..

[18]  M. L. Tsetlin,et al.  Automaton theory and modeling of biological systems , 1973 .

[19]  M. Thathachar,et al.  Networks of Learning Automata: Techniques for Online Stochastic Optimization , 2003 .