An efficient signature-based strategy for supporting inexact filtering in information filtering systems

To help users find the right information from a great quantity of data, Information Filtering (IF) sends information from servers to passive users through broadcast mediums, rather than being searched by them. To efficiently store many user profiles in servers and filter irrelevant users, many signature-based index techniques are applied in IF systems. By using signatures, IF does not need to compare each item of profiles to filter out irrelevant ones. However, because signatures are incomplete information of profiles, it is very hard to answer the complex queries by using only the signatures. Therefore, a critical issue of the signature-based IF service is how to index the signatures of user profiles for an efficient filtering process. The inexact filtering query is one kind of queries in the signature-based IF systems which is used to filter out the non-qualified data compared with queries. In this paper, we propose an ID-tree index strategy, which indexes signatures of user profiles by partitioning them into subgroups using a binary tree structure according to all of the different items among them. In an ID-tree, each path from the root to a leaf node is the signature of the profile pointed by the leaf node. Because each profile is pointed by only one leaf node of the ID-tree, there will be no collision in the structure. Moreover, only the different items among subgroups of profiles will be checked at one time to filter out irrelevant profiles for queries. Therefore, our strategy can answer the inexact filtering query with less number of accessed profiles as compared to Chen's signature tree strategy. From our simulation results, we have shown that our strategy can access less number of profiles to answer the queries than Chen's signature tree strategy for the inexact filtering.

[1]  Rudolf Bayer,et al.  Prefix B-trees , 1977, TODS.

[2]  Hiroyuki Kitagawa,et al.  Evaluation of signature files as set access facilities in OODBs , 1993, SIGMOD '93.

[3]  IshikawaYoshiharu,et al.  Evaluation of signature files as set access facilities in OODBs , 1993 .

[4]  Nick Antonopoulos,et al.  CinemaScreen recommender agent: combining collaborative and content-based filtering , 2006, IEEE Intelligent Systems.

[5]  Kotagiri Ramamohanarao,et al.  A Signature File Scheme Based on Multiple Organizations for Indexing Very Large Text Databases. , 1990 .

[6]  Yannis Manolopoulos,et al.  Improved Methods for Signature-Tree Construction , 2000, Comput. J..

[7]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[8]  Ye-In Chang,et al.  A Data Mining-Based Method for the Incremental Update of Supporting Personalized Information Filtering , 2008, J. Inf. Sci. Eng..

[9]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[10]  John Yen,et al.  An adaptive algorithm for learning changes in user interests , 1999, CIKM '99.

[11]  Yangjun Chen,et al.  On the signature trees and balanced signature trees , 2005, 21st International Conference on Data Engineering (ICDE'05).

[12]  Uwe Deppisch,et al.  S-tree: a dynamic balanced signature index for office retrieval , 1986, SIGIR '86.

[13]  Christos Faloutsos,et al.  Access methods for text , 1985, CSUR.

[14]  Liming Chen,et al.  WebGuard: a Web filtering engine combining textual, structural, and visual content-based analysis , 2006, IEEE Transactions on Knowledge and Data Engineering.

[15]  Christos Faloutsos,et al.  A survey of information retrieval and filtering methods , 1995 .

[16]  Nikos Mamoulis,et al.  Similarity search in sets and categorical data using the signature tree , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[17]  Donald R. Morrison,et al.  PATRICIA—Practical Algorithm To Retrieve Information Coded in Alphanumeric , 1968, J. ACM.

[18]  Panayiotis Bozanis,et al.  Signature-based structures for objects with set-valued attributes , 2002, Inf. Syst..