A Novel Approach for Automatic Web Page Classification using Feature Intervals

A new web page classification algorithm using weighted voting of feature intervals known as WVFI is proposed in this paper. This classifier first discretizes the web page features using a supervised disctretization algorithm which identifies the number of intervals each feature has to be discretized automatically. Each feature is then made to predict the class of the corresponding feature in the test web page using the class distribution of its intervals. The final class of the test web page is predicted by aggregating the weighted vote of each feature. Experiments done on a benchmarking data set called WebKB has shown good classification accuracy when compared with many of the existing classifiers.

[1]  J. Alamelu Mangai,et al.  A Novel Approach for Web Page Classification using Optimum features , 2011 .

[2]  H. Altay Güvenir,et al.  Classification by Voting Feature Intervals , 1997, ECML.

[3]  Chih-Ming Chen,et al.  Two novel feature selection approaches for web page classification , 2009, Expert Syst. Appl..

[4]  Toshiko Wakaki,et al.  Rough Set-Aided Feature Selection for Automatic Web-Page Classification , 2004, IEEE/WIC/ACM International Conference on Web Intelligence (WI'04).

[5]  Saadat M. Alhashmi,et al.  Joint Web-Feature (JFEAT): A Novel Web Page Classification Framework , 2010 .

[6]  Yong Yu,et al.  A Novel Web Page Categorization Algorithm Based on Block Propagation Using Query-Log Information , 2006, WAIM.

[7]  Maryam Mahmoudi,et al.  A Persian Web Page Classifi er Applying a Combination of Content-Based and Context-Based Features , 2009 .

[8]  Wei-Ying Ma,et al.  Web-page classification through summarization , 2004, SIGIR '04.

[9]  Arul Prakash Asirvatham,et al.  Web Page Classification based on Document Structure , 2001 .

[10]  Ali Selamat,et al.  Web page feature selection and classification using neural networks , 2004, Inf. Sci..

[11]  J. A. Mangai,et al.  A supervised discretization algorithm for web page classification , 2012, 2012 International Conference on Innovations in Information Technology (IIT).

[12]  Peiying Zhang,et al.  The Effective Classification of the Chines e Web Pages Based on KNN , 2010 .

[13]  Zhong Ming,et al.  Text Learning and Hierarchical Feature Selection in Webpage Classification , 2008, ADMA.

[14]  R. Rajaram,et al.  Generating Best Features for Web Page Classification , 2008, Webology.

[15]  Viktor de Boer,et al.  Classifying Web Pages with Visual Features , 2010, WEBIST.

[16]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[17]  S. Kotsiantis,et al.  Discretization Techniques: A recent survey , 2006 .

[18]  Qiang Shen,et al.  Webpage Classification with ACO-Enhanced Fuzzy-Rough Feature Selection , 2006, RSCTC.

[19]  Ahmet Arslan,et al.  Improving Classification Accuracy with Discretization on Datasets Including Continuous Valued Features , 2011 .

[20]  Chris J. Hinde,et al.  Embarking on a Web Information Extraction project , 2007 .

[21]  Huan Liu,et al.  Discretization: An Enabling Technique , 2002, Data Mining and Knowledge Discovery.