A supervised discretization algorithm for web page classification

The search engines provide huge number of web pages for each user query making it difficult to get the desired relevant result. This is due to the exponential increase in the size of the information repository, the WWW. In this paper we have implemented a supervised discretization algorithm which is used for classifying large scale data base like web pages using an inconsistency measure. This algorithm does not require apriori knowledge about the data base used and therefore identifies the number of bins automatically. Experiments are done on WebKB, a benchmarking data set for the machine learning community. The results have shown a good improvement in classification accuracy with discretized features than with continuous features.

[1]  Maryam Mahmoudi,et al.  A Persian Web Page Classifi er Applying a Combination of Content-Based and Context-Based Features , 2009 .

[2]  Chih-Ming Chen,et al.  Two novel feature selection approaches for web page classification , 2009, Expert Syst. Appl..

[3]  Lukasz A. Kurgan,et al.  CAIM discretization algorithm , 2004, IEEE Transactions on Knowledge and Data Engineering.

[4]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[5]  J. Alamelu Mangai,et al.  A Novel Approach for Web Page Classification using Optimum features , 2011 .

[6]  D. Manjula,et al.  NANO: A New Supervised Algorithm for Feature Selection with Discretization , 2009, 2009 IEEE International Advance Computing Conference.

[7]  Qiang Shen,et al.  Webpage Classification with ACO-Enhanced Fuzzy-Rough Feature Selection , 2006, RSCTC.

[8]  Zhao Yang Dong,et al.  An improved Naive Bayesian classifier with advanced discretisation method , 2007, Int. J. Intell. Syst. Technol. Appl..

[9]  Ali Selamat,et al.  Web page feature selection and classification using neural networks , 2004, Inf. Sci..

[10]  Takashi Washio,et al.  Automatic Web-Page Classification by Using Machine Learning Methods , 2001, Web Intelligence.

[11]  Saadat M. Alhashmi,et al.  Joint Web-Feature (JFEAT): A Novel Web Page Classification Framework , 2010 .

[12]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[13]  Sun Bo,et al.  A Study on Automatic Web Pages Categorization , 2009, 2009 IEEE International Advance Computing Conference.

[14]  Ahmet Arslan,et al.  Improving Classification Accuracy with Discretization on Datasets Including Continuous Valued Features , 2011 .

[15]  S. Sitharama Iyengar,et al.  Effective Discretization and Hybrid feature selection using Naïve Bayesian classifier for Medical datamining , 2009 .

[16]  Zhong Ming,et al.  Text Learning and Hierarchical Feature Selection in Webpage Classification , 2008, ADMA.

[17]  Zhijing Liu,et al.  A Novel Approach to Naive Bayes Web Page Automatic Classification , 2008, 2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery.

[18]  Roger G. Stone,et al.  Naive Bayes vs. Decision Trees vs. Neural Networks in the Classification of Training Web Pages , 2009 .

[19]  Huan Liu,et al.  Discretization: An Enabling Technique , 2002, Data Mining and Knowledge Discovery.

[20]  Huan Liu,et al.  Feature Selection via Discretization , 1997, IEEE Trans. Knowl. Data Eng..

[21]  R. Rajaram,et al.  Generating Best Features for Web Page Classification , 2008, Webology.