Extracting Semantic Relationships between Terms: Supervised vs. Unsupervised Methods

As the amount of electronic documents (corpora, dictionaries, newspapers, newswires, etc.) becomes more andmore important and diversified, there is a need to extract inf ormation automatically from these texts.In order to extract terms and relations between terms, two methods can be used. The first method is theunsupervised approach, which requires a term extraction module and few predefined t ypes, especially termtypes, in order to find relationships between terms and to ass ign appropriate types to the relationships.Works on automatic term recognition usually involve predefi nition of a set of term patterns, extractionprocedure and a scoring mechanism to filter out non-relevant candidates. Smadja (1993) describes a set oftechniques based on statistical methods for retrieving collocations from large text collections. Daille (1996)presents a combination of linguistic filters and statistica l methods to extract two-word terms. This work imple-ments finite automata for each term pattern, then various sta tistical scores for ranking the extracted terms arecompared.Unsupervised identification of term relationships is a more complicated task, reported in works from variousfields including Computational Linguistics and Knowledge D iscovery in Texts. A keyword-based model for textmining is described in Feldman and Dagan (1995). The work suggests to use a wide range of KDD (KnowledgeDiscovery in Databases) operations on collections of textual documents, including association discovery amongkeywords within the documents. Cooper and Byrd (1997) reports the T