Rough Set-Aided Feature Selection for Automatic Web-Page Classification

Recently Web-pages on the World Wide Web are explosively increasing, and it is now required for portal sites such as Yahoo! service having directory-style search engines to classify Web-pages into many categories automatically. This paper investigates how rough settheory can help select relevant features for Web-page classification. Our experimental results show that the combination of the rough set-aided feature selection method and the Support Vector Machine with a linear kernel is quite useful in practice to classify Web-pages into many categories because not only the performance gives acceptable accuracy but also the high dimensionality reduction is achieved without depending on arbitrary thresholds for feature selection.

[1]  Xiangji Huang,et al.  Feature Selection with Rough Sets for Web Page Classification , 2004, Trans. Rough Sets.

[2]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[3]  Janusz Zalewski,et al.  Rough sets: Theoretical aspects of reasoning about data , 1996 .

[4]  Qiang Shen,et al.  Rough set-based dimensionality reduction for supervised and unsupervised learning , 2001 .

[5]  P. Lingras,et al.  Interval set classifiers using support vector machines , 2004, IEEE Annual Meeting of the Fuzzy Information, 2004. Processing NAFIPS '04..

[6]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[7]  Qiang Shen,et al.  Rough set-aided keyword reduction for text categorization , 2001, Appl. Artif. Intell..

[8]  Z. Pawlak Rough Sets: Theoretical Aspects of Reasoning about Data , 1991 .

[9]  Andrzej Skowron,et al.  Proceedings of the 2005 IEEE / WIC / ACM International Conference on Web Intelligence , 2005 .

[10]  Huan Liu,et al.  Consistency-based search in feature selection , 2003, Artif. Intell..

[11]  Marko Grobelnik,et al.  Feature Selection Using Support Vector Machines , 2002 .

[12]  嶋 幸太郎 Identifying discriminative features from high-dimensional data using support vector machines , 2003 .

[13]  Takashi Washio,et al.  Automatic Web-Page Classification by Using Machine Learning Methods , 2001, Web Intelligence.

[14]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[15]  Wojciech Ziarko,et al.  Variable Precision Extension of Rough Sets , 1996, Fundam. Informaticae.