Document Indexing with a Concept Hierarchy Índice de documentos con una jerarquía de conceptos

Given a large hierarchical concept dictionary (thesaurus, or ontology), the task of selection of the concepts that describe the contents of a given document is considered. A statistical method of document indexing driven by such a dictionary is proposed. The method is insensible to inaccuracies in the dictionary, which allow for semiautomatic translation of the hierarchy into different languages. The problem of handling non-terminal and especially top-level nodes in the hierarchy is discussed. Common sense-complaint methods of automatically assigning the weights to the nodes and links in the hierarchy are presented. The application of the method in the Classifier system is discussed.

[1]  Sholom M. Weiss,et al.  Automated learning of decision rules for text categorization , 1994, TOIS.

[2]  Otto Panman Homonymy and polysemy , 1982 .

[3]  Robert Krovetz,et al.  Homonymy and Polysemy in Information Retrieval , 1997, ACL.

[4]  Adolfo Guzmán Arenas Hallando los temas principales en un artículo en español (Parte I) , 1997 .

[5]  John Light A distributed, graphical, topic-oriented document search system , 1997, CIKM '97.

[6]  W. Bruce Croft,et al.  Text Segmentation by Topic , 1997, ECDL.

[7]  Adolfo Guzman Finding the main themes in a spanish document , 1998 .

[8]  Toshinori Munakata,et al.  Knowledge discovery , 1999, Commun. ACM.

[9]  Daphne Koller,et al.  Hierarchically Classifying Documents Using Very Few Words , 1997, ICML.

[10]  Prabhakar Raghavan,et al.  Using Taxonomy, Discriminants, and Signatures for Navigating in Text Databases , 1997, VLDB.

[11]  Ido Dagan,et al.  Knowledge Discovery in Textual Databases (KDT) , 1995, KDD.

[12]  Adolfo Guzmán-Arenas,et al.  Document comparison with a weighted topic hierarchy , 1999, Proceedings. Tenth International Workshop on Database and Expert Systems Applications. DEXA 99.

[13]  Harry Wechsler,et al.  Document classification using connectionist models , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[14]  Makoto Iwayama,et al.  Topic Graph Generation for Query Navigation: Use of Frequency Classes for Topic Extraction , 1997, ArXiv.

[15]  Ronald Rosenfeld,et al.  Using story topics for language model adaptation , 1997, EUROSPEECH.

[16]  Adolfo Guzmán-Arenas,et al.  A method of describing document contents through topic selection , 1999, 6th International Symposium on String Processing and Information Retrieval. 5th International Workshop on Groupware (Cat. No.PR00268).

[17]  Yoram Singer,et al.  Context-sensitive learning methods for text categorization , 1996, SIGIR '96.