Pointwise Prediction for Robust, Adaptable Japanese Morphological Analysis

We present a pointwise approach to Japanese morphological analysis (MA) that ignores structure information during learning and tagging. Despite the lack of structure, it is able to outperform the current state-of-the-art structured approach for Japanese MA, and achieves accuracy similar to that of structured predictors using the same feature set. We also find that the method is both robust to out-of-domain data, and can be easily adapted through the use of a combination of partial annotation and active learning.

[1]  Masaaki Nagata,et al.  A Stochastic Japanese Morphological Analyzer Using a Forward-DP Backward-A* N-Best Search Algorithm , 1994, COLING.

[2]  Lluís Màrquez i Villodre,et al.  An Empirical Study of the Domain Dependence of Supervised Word Disambiguation Systems , 2000, EMNLP.

[3]  Jordi Girona Salgado An Empirical Study of the Domain Dependence of Supervised Word Sense Disambiguation Systems , 2000 .

[4]  Yuji Matsumoto,et al.  Extended Models and Tools for High-performance Part-of-speech , 2000, COLING.

[5]  Manabu Sassano,et al.  An Empirical Study of Active Learning with Support Vector Machines for Japanese Word Segmentation , 2002, ACL.

[6]  Hwee Tou Ng,et al.  Chinese Part-of-Speech Tagging: One-at-a-Time or All-at-Once? Word-Based or Character-Based? , 2004, EMNLP.

[7]  Tetsuji Nakagawa,et al.  Chinese and Japanese Word Segmentation Using Word-Level and Character-Level Information , 2004, COLING.

[8]  Andrew McCallum,et al.  Chinese Segmentation and New Word Detection using Conditional Random Fields , 2004, COLING.

[9]  Yuji Matsumoto,et al.  Applying Conditional Random Fields to Japanese Morphological Analysis , 2004, EMNLP.

[10]  Taku Kudo,et al.  MeCab : Yet Another Part-of-Speech and Morphological Analyzer , 2005 .

[11]  Hwee Tou Ng,et al.  Domain Adaptation with Active Learning for Word Sense Disambiguation , 2007, ACL.

[12]  Eric K. Ringger,et al.  Active Learning for Part-of-Speech Tagging: Accelerating Corpus Annotation , 2007, LAW@ACL.

[13]  Dan Klein,et al.  Structure compilation: trading structure for features , 2008, ICML '08.

[14]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[15]  Mark Craven,et al.  An Analysis of Active Learning Strategies for Sequence Labeling Tasks , 2008, EMNLP.

[16]  Kikuo Maekawa,et al.  Balanced corpus of contemporary written Japanese , 2013, Language Resources and Evaluation.

[17]  Yuji Matsumoto,et al.  Training Conditional Random Fields Using Incomplete Annotations , 2008, COLING.

[18]  Hitoshi Isahara,et al.  An Error-Driven Word-Character Hybrid Model for Joint Chinese Word Segmentation and POS Tagging , 2009, ACL/IJCNLP.

[19]  Daumé,et al.  Domain Adaptation meets Active Learning , 2010, HLT-NAACL 2010.

[20]  Sadao Kurohashi,et al.  Using Smaller Constituents Rather Than Sentences in Active Learning for Japanese Dependency Parsing , 2010, ACL.

[21]  Graham Neubig,et al.  Word-based Partial Annotation for Efficient Corpus Construction , 2010, LREC.