A learning-based approach for fetching pages in WebVigiL

The World Wide Web is an omni-present and an ever-expanding source of data. Data on the web is constantly increasing and changing. Many a times, users are interested in specific changes to the data on the web. Currently, in order to detect changes of interest, users have to poll the pages periodically and check for the changes of interest. WebVigiL is a general-purpose information monitoring and notification system. It handles the specification, intelligent fetch, detection, and propagation of changes as requested by a user while meeting the quality of service requirements. We use the active capability in the form of event-condition-action (ECA) rules, and a combination of push/pull paradigm for change monitoring. In this paper, we present an overview of the specification language and the run time management of sentinels. We discuss in detail the use of ECA rules for fetching and the adaptive learning algorithm used for fetching pages. We conclude with the implementation status of WebVigiL.