Statistical Model for Japanese Abbreviations

We present a new approach to detect abbreviations given a root expression. The method is based on a statistical model combining two internal models: a generation and a verification model. The statistical model accounts for both the validity of abbreviations as a character sequence generated from a root (as learnt from the collection of abbreviation-root pairs) and their social validity, indicating how they are really used in the world (as obtained from a web search engine). The experimental results showed that our method outperforms traditional template-based methods. Specifically, using co-occurrence in the verification model yielded the best performance in our method.