Mining textual data through term variant clustering: the TermWatch system

We present a system for mapping the structure of research topics in a corpus. TermWatch portrays the "aboutness" of a corpus of scientific and technical publications by bridging the gap between pure statistical approaches and symbolic techniques. In the present paper, an experiment on unsupervised textmining is performed on a corpus of scientific titles and abstracts from 16 prominent IR journals. The preliminary results showed that TermWatch was able to capture low occurring phenomena which the usual clustering methods based on co-occurrence may not highlight. The results also reflect the expressive power of terminological variations as a means to capture the structure of research topics contained in a corpus.

[1]  Max Silberztein,et al.  Dictionnaires électroniques et analyse automatique de textes : le système intex , 1993 .

[2]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[3]  M. Callon,et al.  From translations to problematic networks: An introduction to co-word analysis , 1983 .

[4]  Marti A. Hearst Untangling Text Data Mining , 1999, ACL.

[5]  Christian Jacquemin,et al.  Spotting and Discovering Terms through Natural Language Processing , 1997 .

[6]  Gerard Salton,et al.  Automatic text decomposition using text segments and text themes , 1996, HYPERTEXT '96.

[7]  Bruno Leclerc,et al.  Ensembles Ordonnes Et Taxonomie Mathematique , 1984 .

[8]  Fidelia Ibekwe-Sanjuan A Linguistic and Mathematical Method for Mapping Thematic Trends from Texts , 1998, ECAI.

[9]  Timothy Cribbin,et al.  Visualizing and tracking the growth of competing paradigms: Two case studies , 2002, J. Assoc. Inf. Sci. Technol..

[10]  Henry Small Visualizing science by citation mapping , 1999 .

[11]  Christian Jacquemin,et al.  Automatic Acquisition and Expansion of Hypernym Links , 2004, Comput. Humanit..

[12]  Mohamed Nadif,et al.  Classification et désarticulation de graphes de termes , 2004 .

[13]  Fidelia Ibekwe-SanJuan,et al.  Can syntactic variations highlight semantic links between domain topics , 2002 .

[14]  Fabio Rinaldi,et al.  Complex Structuring of Term Variants for Question Answering , 2003, ACL 2003.

[15]  Henry G. Small,et al.  Co-citation in the scientific literature: A new measure of the relationship between two documents , 1973, J. Am. Soc. Inf. Sci..