A Rough Set-Aided System for Sorting WWW Bookmarks

Most people store 'bookmarks' to web pages. These allow the user to return to a web page later on, without having to remember the exact URL address. People attempt to organise their bookmark databases by filing bookmarks under categories, themselves arranged in a hierarchical-fashion. As the maintenance of such large repositories is difficult and time-consuming, a tool that automatically categorises bookmarks is required. This paper investigates how rough set theory can help extract information out of this domain, for use in an experimental automatic bookmark classification system. In particular, work on rough set dependency degrees is applied to reduce the otherwise high dimensionality of the feature patterns used to characterize bookmarks. A comparison is made between this approach to data reduction and a conventional entropy-based approach.

[1]  Edward A. Fox,et al.  Research Contributions , 2014 .

[2]  Jon M. Kleinberg,et al.  Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text , 1998, Comput. Networks.

[3]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[4]  Oscar H. IBARm Information and Control , 1957, Nature.

[5]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[6]  Manoranjan Dash,et al.  Dimensionality reduction of unsupervised data , 1997, Proceedings Ninth IEEE International Conference on Tools with Artificial Intelligence.

[7]  Charles T. Meadow,et al.  Text information retrieval systems , 1992 .

[8]  Padmini Srinivasan,et al.  Hierarchical neural networks for text categorization , 1999, SIGIR 1999.

[9]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[10]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[11]  Qiang Shen,et al.  Rough set-aided keyword reduction for text categorization , 2001, Appl. Artif. Intell..

[12]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[13]  Edward Y. Chang,et al.  PowerBookmarks: a system for personalizable Web information organization, sharing, and management , 1999, SIGMOD '99.

[14]  Josef Kittler,et al.  Pattern recognition : a statistical approach , 1982 .

[15]  Hwee Tou Ng,et al.  Feature selection, perceptron learning, and a usability case study for text categorization , 1997, SIGIR '97.

[16]  Saul Greenberg,et al.  Revisitation patterns in World Wide Web navigation , 1997, CHI.

[17]  J. Kellett London , 1914, The Hospital.

[18]  Janusz Zalewski,et al.  Rough sets: Theoretical aspects of reasoning about data , 1996 .

[19]  Hinrich Schütze,et al.  A comparison of classifiers and document representations for the routing problem , 1995, SIGIR '95.

[20]  Olve Maudal Preprocessing data for Neural Network based Classifiers - Rough Sets vs Principal Component Analysis , 1996 .

[21]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorisation: a survey , 1999 .

[22]  Qiang Shen,et al.  A modular approach to generating fuzzy rules with reduced attributes for the monitoring of complex systems , 2000 .

[23]  Israel Ben-Shaul,et al.  Automatically Organizing Bookmarks per Contents , 1996, Comput. Networks.

[24]  W. Pedrycz,et al.  An introduction to fuzzy sets : analysis and design , 1998 .

[25]  E. Tronci,et al.  1996 , 1997, Affair of the Heart.

[26]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[27]  Sholom M. Weiss,et al.  Automated learning of decision rules for text categorization , 1994, TOIS.

[28]  H. S. Heaps,et al.  Information retrieval, computational and theoretical aspects , 1978 .

[29]  J. Davenport Editor , 1960 .

[30]  Earl Cox,et al.  The fuzzy systems handbook - a practitioner's guide to building, using, and maintaining fuzzy systems , 1994 .

[31]  Mary Czerwinski,et al.  Web page design: implications of memory, structure and scent for information retrieval , 1998, CHI.