The heterogeneity and the lack of structure of World Wide Web make automated discovery, organization, and management of Web-based information a non-trivial task. Traditional search and indexing tools provide some comfort to users, but they generally provide neither structured information nor categorize, filter, or interpret documents in an automated way. In recent years, these factors have prompted the need for developing data mining techniques applied to the web, giving rise to the term “Web Mining”. This paper introduces the problem of web data extraction and gives a brief analysis of the various techniques to address it. Then, News Miner, a tool for Web Content Mining applied to the news retrieval is presented.
[1]
Oren Etzioni,et al.
The World-Wide Web: quagmire or gold mine?
,
1996,
CACM.
[2]
Jaideep Srivastava,et al.
Web usage mining: discovery and applications of usage patterns from Web data
,
2000,
SKDD.
[3]
A KnoblockCraig,et al.
Wrapper generation for semi-structured Internet sources
,
1997
.
[4]
Alberto Sillitti,et al.
Service Oriented Programming: A New Paradigm of Software Reuse
,
2002,
ICSR.
[5]
Jaideep Srivastava,et al.
Web mining: information and pattern discovery on the World Wide Web
,
1997,
Proceedings Ninth IEEE International Conference on Tools with Artificial Intelligence.