Automatic Topic Identification Using Ontology Hierarchy

This paper proposes a method of using ontology hierarchy in automatic topic identification. The fundamental idea behind this work is to exploit an ontology hierarchical structure in order to find a topic of a text. The keywords that are extracted from a given text will be mapped onto their corresponding concepts in the ontology. By optimizing the corresponding concepts, we will pick a single node among the concepts nodes that we believe is the topic of the target text. However, a limited vocabulary problem is encountered while mapping the keywords onto their corresponding concepts. This situation forces us to extend the ontology by enriching each of its concepts with new concepts using the external linguistics knowledge-base (WordNet). Our intuition of a high number keywords mapped onto the ontology concepts is that our topic identification technique can perform at its best.

[1]  Adolfo Guzman Finding the main themes in a spanish document , 1998 .

[2]  Daphne Koller,et al.  Hierarchically Classifying Documents Using Very Few Words , 1997, ICML.

[3]  Patrick Brézillon,et al.  Lecture Notes in Artificial Intelligence , 1999 .

[4]  Piotr Indyk,et al.  Enhanced hypertext categorization using hyperlinks , 1998, SIGMOD '98.

[5]  Stan Matwin,et al.  Text Classification Using WordNet Hypernyms , 1998, WordNet@ACL/COLING.

[6]  Ellen M. Voorhees,et al.  On Expanding Query Vectors with Lexically Related Words , 1993, TREC.

[7]  Stephen Soderland,et al.  Learning to Extract Text-Based Information from the World Wide Web , 1997, KDD.

[8]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[9]  Aaron Kershenbaum,et al.  The Effect of Topological Structure on Hierarchical Text Categorization , 1998, VLC@COLING/ACL.

[10]  Eduard H. Hovy,et al.  Identifying Topics by Position , 1997, ANLP.

[11]  Adolfo Guzmán-Arenas,et al.  Use of a Weighted Topic Hierarchy for Document Classification , 1999, TSD.

[12]  Grigori Sidorov,et al.  Text Categorization Using a Hierarchical Topic Dictionary , 1999 .

[13]  Eli Upfal,et al.  Web search using automatic classification , 1996, WWW 1996.

[14]  Mounia Lalmas,et al.  A probabilistic description-oriented approach for categorizing web documents , 1999, CIKM '99.

[15]  Chin-Yew Lin Knowledge-Based Automatic Topic Identification , 1995, ACL.

[16]  Adolfo Guzmán-Arenas,et al.  A method of describing document contents through topic selection , 1999, 6th International Symposium on String Processing and Information Retrieval. 5th International Workshop on Groupware (Cat. No.PR00268).

[17]  Tom M. Mitchell,et al.  Improving Text Classification by Shrinkage in a Hierarchy of Classes , 1998, ICML.