Managing Web-Based Information

The heterogeneity and the lack of structure of World Wide Web make automated discovery, organization, and management of Web-based information a non-trivial task. Traditional search and indexing tools provide some comfort to users, but they generally provide neither structured information nor categorize, filter, or interpret documents in an automated way. In recent years, these factors have prompted the need for developing data mining techniques applied to the web, giving rise to the term “Web Mining”. This paper introduces the problem of web data extraction and gives a brief analysis of the various techniques to address it. Then, News Miner, a tool for Web Content Mining applied to the news retrieval is presented.