A Hybrid Approach for Automatic Classification of Chinese Unknown Verbs

本論文合併兩種方法預測未知動詞的詞類。第一種方法為規則法,即從訓練 語料中歸納出未知動詞組成的構詞規律,分成兩個主要的判斷方式:一、依 照未知動詞的組成的關鍵字決定其分類。二、依照未知動詞的構成組合決定 其分類。 關鍵字法首先將動詞依長度分為四組。第一組為二字詞、三字詞、四字詞、 五字以上的詞彙。在對實際語料的觀察下,發現不同詞長的動詞結構相異, 因此將語料依詞長分組。例如:三字詞可訓練出「好」、「出」兩條規則決 定動詞的詞類,其他長度的未知動詞並沒有這兩條規則,另外「化」規則不 適用於二字動詞。 規則法的第二部分為依照構成組合決定其分類。在觀察未知動詞時,發現有 部分未知動詞的組合很具有規律,我們就將訓練語料中未知動詞的組合做個 歸納,得到九種組合。在十次實驗中,規則法可以處理的未知動詞平均約為 23.19%,猜測正確的比例為 91.67%。 二、相似法為利用與未知動詞相似的例子來預測未知動詞的詞類。相似法主 要利用知網與中央研究院中文句結構樹資料庫 1.0 作為語意與詞類相似度測 量的工具。藉由計算未知動詞與已知動詞的相似度來預測未知動詞的詞類, 未知動詞的詞類為與其相似度最高的相似例子的詞類。 * 中央研究院資訊所,曾慧馨 E-mail: huihsin@iis.sinica.edu.tw 陳克健 E-mail: kchen@iis.sinica.edu.tw + 政治大學資訊系 E-mail: chaolin@ nccu.edu.tw ** 台灣大學外文系 E-mail: zmgao@ccms.ntu.edu.tw

[1]  W. Bruce Croft,et al.  Lexical ambiguity and information retrieval , 1992, TOIS.

[2]  Dan Roth,et al.  Learning to Resolve Natural Language Ambiguities: A Unified Approach , 1998, AAAI/IAAI.

[3]  SchwartzRichard,et al.  Coping with ambiguity and unknown words through probabilistic models , 1993 .

[4]  S. Carey The Origin and Evolution of Everyday Concepts , 1992 .

[5]  Ellen M. Voorhees,et al.  Using WordNet to disambiguate word senses for text retrieval , 1993, SIGIR.

[6]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[7]  Yorick Wilks,et al.  Word Sense Disambiguation using Optimised Combinations of Knowledge Sources , 1998, COLING-ACL.

[8]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[9]  Katia Sycara,et al.  A Learning Personal Agent for Text Filtering and Notification , 1996 .

[10]  R. Sproat,et al.  A corpus-based analysis of Mandarin nominal root compound , 1996 .

[11]  Philip Resnik,et al.  A Perspective on Word Sense Disambiguation Methods and Their Evaluation , 2002 .

[12]  Gary Marchionini,et al.  A Conceptual Framework for Text Filtering , 1996 .

[13]  Keh-Jiann Chen,et al.  Unknown Word Detection for Chinese by a Corpus-based Learning Method , 1998, ROCLING/IJCLCLP.

[14]  Philip Resnik,et al.  Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..

[15]  Keh-Jiann Chen,et al.  Knowledge Extraction for Identification of Chinese Organization Names , 2000, ACL 2000.

[16]  Charles N. Li,et al.  Mandarin Chinese: A Functional Reference Grammar , 1989 .

[17]  ResnikPhilip,et al.  Distinguishing systems and distinguishing senses: new evaluation methods for Word Sense Disambiguation , 1999 .

[18]  Avi Arampatzis,et al.  Text Filtering using Linguistically-Motivated Indexing Terms , 1999 .

[19]  Garrison W. Cottrell,et al.  Lexical ambiguity resolution , 1987 .

[20]  Jonathan Slocum,et al.  Transportability to other languages: the natural language processing project in the AI program at MCC , 1985, TOIS.