Categorisation by Context

The traditional approach to document categorization is categorization by content, since information for categorizing a document is extracted from the document itself. In a hypertext environment like the Web, the structure of documents and the link topology can be exploited to perform what we call categorization by context [Attardi 98]: the context surrounding a link in an HTML document is used for categorizing the document referred by the link. Categorization by context is capable of dealing also with multimedia material, since it does not rely on the ability to analyze the content of documents. Categorization by context leverages on the categorization activity implicitly performed when someone places or refers to a document on the Web. By focusing the analysis to the documents used by a group of people, one can build a catalogue tuned to the need of that group. Categorization by context is based on the following assumptions:

[1]  Dayne Freitag,et al.  A Machine Learning Architecture for Optimizing Web Search Engines , 1999 .

[2]  Matthew Chalmers,et al.  The Order of Things: Activity-Centred Information Access, , 1998, Comput. Networks.

[3]  Yiming Yang,et al.  Expert network: effective and efficient learning from human decisions in text categorization and retrieval , 1994, SIGIR '94.

[4]  Chris Buckley,et al.  A probabilistic learning approach for document indexing , 1991, TOIS.

[5]  Andrei Mikheev Learning Part-of-Speech Guessing Rules from Lexicon: Extension to Non-Concatenative Operations , 1996, COLING.

[6]  Neil C. Rowe,et al.  Natural-language retrieval of images based on descriptive captions , 1996, TOIS.

[7]  Israel Ben-Shaul,et al.  Automatically Organizing Bookmarks per Contents , 1996, Comput. Networks.

[8]  Jacques Savoy,et al.  Citation Schemes in Hypertext Information Retrieval , 1996 .

[9]  G Salton,et al.  Automatic Analysis, Theme Generation, and Summarization of Machine-Readable Texts , 1994, Science.

[10]  E H Shortliffe,et al.  Contextual models of clinical publications for enhancing retrieval from full-text databases. , 1995, Proceedings. Symposium on Computer Applications in Medical Care.

[11]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[12]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[13]  William Cooper,et al.  A General Mathematical Model for Information Retrieval Systems , 1976, The Library Quarterly.

[14]  Donna K. Harman,et al.  Relevance Feedback and Other Query Modification Techniques , 1992, Information retrieval (Boston).

[15]  Hinrich Schütze,et al.  A comparison of classifiers and document representations for the routing problem , 1995, SIGIR '95.

[16]  James P. Callan,et al.  Training algorithms for linear text classifiers , 1996, SIGIR '96.

[17]  M. M. Kessler Bibliographic coupling between scientific papers , 1963 .

[18]  Shawn R. Wolfe,et al.  A Bookmarking Service for Organizing and Sharing URLs , 1997, Comput. Networks.

[19]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[20]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[21]  P. Zimmermann Automatic analysis , 2000 .

[22]  Jon M. Kleinberg,et al.  Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text , 1998, Comput. Networks.

[23]  Chanathip Namprempre,et al.  HyPursuit: a hierarchical network search engine that exploits content-link hypertext clustering , 1996, HYPERTEXT '96.

[24]  Ricardo Baeza-Yates,et al.  Information Retrieval: Data Structures and Algorithms , 1992 .

[25]  Rohini K. Srihari,et al.  Automatic Indexing and Content-Based Retrieval of Captioned Images , 1995, Computer.

[26]  David Ellis,et al.  On the measurement of inter-linker consistency and retrieval effectiveness in hypertext databases , 1994, SIGIR '94.

[27]  Mark D. Dunlop,et al.  Image retrieval by hypertext links , 1997, SIGIR '97.

[28]  Valiollah Tahani,et al.  A fuzzy model of document retrieval systems , 1976, Inf. Process. Manag..

[29]  Krishna Bharat,et al.  Supporting cooperative and personal surfing with a desktop assistant , 1997, UIST '97.

[30]  Giuseppe Attardi,et al.  Automatic Web Page Categorization by Link and Context Analysis , 1999 .

[31]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[32]  Cyril Cleverdon,et al.  Optimizing convenient online access to bibliographic databases , 1984 .

[33]  Amanda Spink,et al.  Searchers, The Subjects They Search, And Sufficiency: A Study Of A Large Sample Of Excite Searches , 1998, WebNet.

[34]  Hwee Tou Ng,et al.  Feature selection, perceptron learning, and a usability case study for text categorization , 1997, SIGIR '97.