Abbreviation Generation for Japanese Multi-Word Expressions

This paper proposes a novel method for generating Japanese abbreviations from their full forms with the Log-Linear Model (LLM) in order to take advantage of characteristic patterns of Japanese abbreviation. Our experimental results show that the method is effective for TV program titles that contain colloquial expressions. The proposed method achieved 78.8% recall for the top 30 candidates, whereas a baseline method using Conditional Random Fields (CRFs) achieved 68.3% recall. Moreover, from the results of experiments using six data sets classified according to types of character and semantic categories, we show that each performance of the above two methods depends on the types of the full forms.

[1]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[2]  Naoaki Okazaki,et al.  Data and text mining Building an abbreviation dictionary using a term recognition approach , 2006 .

[3]  Marti A. Hearst,et al.  A Simple Algorithm for Identifying Abbreviation Definitions in Biomedical Text , 2002, Pacific Symposium on Biocomputing.

[4]  Toru Hisamitsu,et al.  Extracting useful terms from parenthetical expressions by combining simple rules and statistical measures: A comparative evaluation of bigram statistics , 2001 .

[5]  Naoaki Okazaki,et al.  A Discriminative Alignment Model for Abbreviation Recognition , 2008, COLING.

[6]  Masuzo Yanagida,et al.  Automatic Generation Abbriviated Forms of Japanese Expressions and its Apprications to Speech Recognition , 2007 .

[7]  Naoaki Okazaki,et al.  A Discriminative Approach to Japanese Abbreviation Extraction , 2008, IJCNLP.

[8]  Eytan Adar,et al.  SaRAD: a Simple and Robust Abbreviation Dictionary , 2004, Bioinform..

[9]  Youngja Park,et al.  Hybrid Text Mining for Finding Abbreviations and their Definitions , 2001, EMNLP.

[10]  Manabu Okumura,et al.  Statistical Model for Japanese Abbreviations , 2008, PRICAI.

[11]  Hiroyuki Sakai,et al.  Improvement of the Method for Acquiring Knowledge from a Single Corpus on Correspondences between Abbreviations and Their Original words , 2005 .

[12]  Peter D. Turney,et al.  A Supervised Learning Approach to Acronym Identification , 2005, Canadian AI.