Web page categorization is one of the challenging tasks in the world of ever increasing web technologies. There are many ways of categorization of web pages based on different approach and features. This paper proposes a new dimension in the way of categorization of web pages using artificial neural network (ANN) through extracting the features automatically. Here eight major categories of web pages have been selected for categorization; these are business & economy, education, government, entertainment, sports, news & media, job search, and science. The whole process of the proposed system is done in three successive stages. In the first stage, the features are automatically extracted through analyzing the source of the web pages. The second stage includes fixing the input values of the neural network; all the values remain between 0 and 1. The variations in those values affect the output. Finally the third stage determines the class of a certain web page out of eight predefined classes. This stage is done using back propagation algorithm of artificial neural network. The proposed concept will facilitate web mining, retrievals of information from the web and also the search engines.
[1]
Kevin Chen-Chuan Chang,et al.
PEBL: positive example based learning for Web page classification using SVM
,
2002,
KDD.
[2]
Toshikazu Fukushima,et al.
Task-oriented world wide web retrieval by document type classification
,
1999,
CIKM '99.
[3]
William P. Birmingham,et al.
Improving category specific Web search by learning query modifications
,
2001,
Proceedings 2001 Symposium on Applications and the Internet.
[4]
Sung-Hyon Myaeng,et al.
A practical hypertext catergorization method using links and incrementally available class information
,
2000,
SIGIR '00.
[5]
Jiawei Han,et al.
Heterogeneous learner for Web page classification
,
2002,
2002 IEEE International Conference on Data Mining, 2002. Proceedings..
[6]
Philip J. Hayes,et al.
CONSTRUE/TIS: A System for Content-Based Indexing of a Database of News Stories
,
1990,
IAAI.
[7]
Neel Sundaresan,et al.
A classifier for semi-structured documents
,
2000,
KDD '00.
[8]
Vipin Kumar,et al.
Partitioning-based clustering for Web document categorization
,
1999,
Decis. Support Syst..