论文信息 - A Survey on Information Retrieval, Text Categorization, and Web Crawling

A Survey on Information Retrieval, Text Categorization, and Web Crawling

This paper is a survey discussing Information Retrieval concepts, methods, and applications. It goes deep into the document and query modelling involved in IR systems, in addition to pre-processing operations such as removing stop words and searching by synonym techniques. The paper also tackles text categorization along with its application in neural networks and machine learning. Finally, the architecture of web crawlers is to be discussed shedding the light on how internet spiders index web documents and how they allow users to search for items on the web.

Youssef Bassil

[1] Gerard Salton,et al. A vector space model for automatic indexing , 1975, CACM.

[2] Joseph Becker,et al. Information Retrieval and Processing , 1975 .

[3] Yiming Yang,et al. A re-examination of text categorization methods , 1999, SIGIR '99.

[4] James H. Martin,et al. Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[5] Hans Peter Luhn,et al. A Statistical Approach to Mechanized Encoding and Searching of Literary Information , 1957, IBM J. Res. Dev..

[6] Christopher D. Manning,et al. Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[7] Ronen Feldman,et al. Book Reviews: The Text Mining Handbook: Advanced Approaches to Analyzing Unstructured Data by Ronen Feldman and James Sanger , 2008, CL.

[8] Bing Liu,et al. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data , 2006, Data-Centric Systems and Applications.