Research on Web Information Filtering Based on DCM Algorithm

In the past few years, the volume of junk information on the internet has grown tremendously, researchers' begun to handle this issue. In this paper, DCM (Discriminative Category Matching) algorithm is employed to filter the web information according to the content of information. To our knowledge, the algorithm is the first introduced into filtering. It takes the relative importance of a feature in a category, across categories and the average importance of a feature in a category into account, and modifies the traditional TF-IDF term weighting scheme to compute the domain class central vector. Experiments indicate that the algorithm is suitable for information filtering.

[1]  Thomas W. Malone,et al.  Intelligent Information Sharing Systems , 1986 .

[2]  Myung-Mook Han,et al.  A structure-based approach for multimedia information filtering , 2006, Multimedia Tools and Applications.

[3]  Gloria Bordogna,et al.  A Dynamic Hierarchical Fuzzy Clustering Algorithm for Information Filtering , 2006, Soft Computing in Web Information Retrieva.

[4]  Harris Drucker,et al.  Support vector machines for spam categorization , 1999, IEEE Trans. Neural Networks.

[5]  Hongjun Lu,et al.  Discriminative category matching: efficient text classification for huge document collections , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..