An Assessment on Character-based Chinese News Filtering Using Latent Semantic Indexing

We assess the Latent Semantic Indexing (LSI) approach to Chinese information filtering. In particular, the approach is for Chinese news filtering agents that use a character-based and hierarchical filtering scheme. The traditional vector space model is employed as an information filtering model, and each document is converted into a vector of weights of terms. Instead of using words as terms in the JR nominating tradition, terms refer to Chinese characters. LSI captures the semantic relationship between documents and Chinese characters. We use the Sin-gular-value Decomposition (SVD) technique to compress the term space into a lower dimension which achieves latent association between documents and terms. The results of experiments show that the recall and precision rates of Chinese news filtering using the character-based ap-proach incorporating the LSI technique are satisfactory.

[1]  G Salton,et al.  Developments in Automatic Text Retrieval , 1991, Science.

[2]  Gerald Salton,et al.  Automatic text processing , 1988 .

[3]  Nicholas J. Belkin,et al.  Information filtering and information retrieval: two sides of the same coin? , 1992, CACM.

[4]  Narsingh Deo,et al.  Incorporating latent semantic indexing into a neural network model for information retrieval , 1996, CIKM '96.

[5]  Jakob Nielsen,et al.  Automating the assignment of submitted manuscripts to reviewers , 1992, SIGIR '92.

[6]  Donna Harman,et al.  The Second Text Retrieval Conference (TREC-2) , 1995, Inf. Process. Manag..

[7]  Thorsten Joachims,et al.  WebWatcher : A Learning Apprentice for the World Wide Web , 1995 .

[8]  Ken Lang,et al.  NewsWeeder: Learning to Filter Netnews , 1995, ICML.

[9]  Kui-Lam Kwok Comparing representations in Chinese information retrieval , 1997, SIGIR '97.

[10]  W. Bruce Croft,et al.  Evaluation of an inference network-based retrieval model , 1991, TOIS.

[11]  Hector Garcia-Molina,et al.  Index structures for information filtering under the vector space model , 1994, Proceedings of 1994 IEEE 10th International Conference on Data Engineering.

[12]  Sung-Hyon Myaeng,et al.  Integration of user profiles: models and experiments in information retrieval , 1990, Inf. Process. Manag..

[13]  Lee-Feng Chien Fast and quasi-natural language search for gigabytes of Chinese texts , 1995, SIGIR '95.

[14]  Katia Sycara,et al.  A Learning Personal Agent for Text Filtering and Notification , 1996 .

[15]  Berthier A. Ribeiro-Neto,et al.  A belief network model for IR , 1996, SIGIR '96.