论文信息 - Statistical Model for Japanese Abbreviations

Statistical Model for Japanese Abbreviations

We present a new approach to detect abbreviations given a root expression. The method is based on a statistical model combining two internal models: a generation and a verification model. The statistical model accounts for both the validity of abbreviations as a character sequence generated from a root (as learnt from the collection of abbreviation-root pairs) and their social validity, indicating how they are really used in the world (as obtained from a web search engine). The experimental results showed that our method outperforms traditional template-based methods. Specifically, using co-occurrence in the verification model yielded the best performance in our method.

Manabu Okumura | Norifumi Murayama

[1] Daniel Marcu,et al. A Noisy-Channel Model for Document Compression , 2002, ACL.

[2] John Hale,et al. A Statistical Approach to Anaphora Resolution , 1998, VLC@COLING/ACL.

[3] Peter D. Turney,et al. A Supervised Learning Approach to Acronym Identification , 2005, Canadian AI.

[4] Jing-Shin Chang,et al. Mining Atomic Chinese Abbreviation Pairs: A Probabilistic Model for Single Character Word Recovery , 2006, SIGHAN@COLING/ACL.

[5] John Cocke,et al. A Statistical Approach to Machine Translation , 1990, CL.

[6] Jing-Shin Chang,et al. A Preliminary Study on Probabilistic Models for Chinese Abbreviations , 2004, SIGHAN@ACL.