A New Topic Filter Based on Maximum Entropy Model

Because of the large web scale and the information requirement for special field, focuse2825453011d search has attracted more and more people. For the complexity of natural language, there are ambiguous for a word itself, and which will take some trouble for topic filter. For the two main problems, false positive and false negative, this paper proposes two new methods separately. By machine learning, we construct a guide model with the maximum entropy principle, by which we can filter the noise pages out easily and by KNN method, the false negative problem will be solved easily. The experiment shows that our model or method really out performs the base-line method.

[1]  Agnar Aamodt,et al.  Case-Based Reasoning: Foundational Issues, Methodological Variations, and System Approaches , 1994, AI Commun..

[2]  Wang Guangxing,et al.  Efficiently Crawling Strategy for Focused Searching Engine , 2007, APWeb/WAIM Workshops.

[3]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[4]  Jun Wu,et al.  Efficient training methods for maximum entropy language modeling , 2000, INTERSPEECH.

[5]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[6]  Yoelle Maarek,et al.  The Shark-Search Algorithm. An Application: Tailored Web Site Mapping , 1998, Comput. Networks.

[7]  Mounia Lalmas,et al.  Information Retrieval: Uncertainty and Logics: Advanced Models for the Representation and Retrieval of Information , 1998 .

[8]  D. Kibler,et al.  Instance-based learning algorithms , 2004, Machine Learning.

[9]  Mitchell P. Marcus,et al.  Maximum entropy models for natural language ambiguity resolution , 1998 .

[10]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[11]  Reinier Post,et al.  Information Retrieval in the World-Wide Web: Making Client-Based Searching Feasible , 1994, Comput. Networks ISDN Syst..

[12]  Eli Upfal,et al.  Web search using automatic classification , 1996, WWW 1996.

[13]  Janet L. Kolodner,et al.  Case-Based Reasoning , 1989, IJCAI 1989.

[14]  Fabio Crestani,et al.  Information Retrieval: Uncertainty and Logics , 1998, The Kluwer International Series on Information Retrieval.

[15]  Martin van den Berg,et al.  Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery , 1999, Comput. Networks.

[16]  Philip S. Yu,et al.  Intelligent crawling on the World Wide Web with arbitrary predicates , 2001, WWW '01.