Using element and document profile for information clustering

The tremendous growth in the amount of information available and the number of visitors to Web sites in the recent years poses some key challenges for information filtering and retrieval. Web visitors not only expect high quality and relevant information, but also wish that the information be presented in an as efficient way as possible. The traditional filtering methods, however, only consider the relevant values of document. These conventional methods fail to consider the efficiency of documents retrieval. In this paper, we propose a new algorithm to calculate an index called document similarity score based on elements of the document. Using the index, document profile will be derived. Any documents with the similarity score above a given threshold are clustered. Using these pre-clustered documents, information filtering and retrieval can be made more efficient.

[1]  Stephen E. Robertson,et al.  Threshold setting in adaptive filtering , 2000, J. Documentation.

[2]  Koichi Takeda,et al.  Information retrieval on the web , 2000, CSUR.

[3]  Yangyong Zhu,et al.  A New Algorithm for Performing Ratings-Based Collaborative Filtering , 2003, APWeb.

[4]  Andreas Stafylopatis,et al.  A Fuzzy Rule-Based Agent for Web Retrieval-Filtering , 2001, Web Intelligence.

[5]  Xiannong Meng,et al.  Personalized Web Search with Clusters , 2003, International Conference on Internet Computing.

[6]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .