论文信息 - An Empirical Study of Active Learning with Support Vector Machines for Japanese Word Segmentation

An Empirical Study of Active Learning with Support Vector Machines for Japanese Word Segmentation

We explore how active learning with Support Vector Machines works well for a non-trivial task in natural language processing. We use Japanese word segmentation as a test case. In particular, we discuss how the size of a pool affects the learning curve. It is found that in the early stage of training with a larger pool, more labeled examples are required to achieve a given level of accuracy than those with a smaller pool. In addition, we propose a novel technique to use a large number of unlabeled examples effectively by adding them gradually to a pool. The experimental results show that our technique requires less labeled examples than those with the technique in previous research. To achieve 97.0% accuracy, the proposed technique needs 59.3% of labeled examples that are required when using the previous technique and only 17.4% of labeled examples with random sampling.

Manabu Sassano | M. Sassano

[1] Daphne Koller,et al. Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[2] Rebecca Hwa,et al. Sample Selection for Statistical Grammar Induction , 2000, EMNLP.

[3] David Yarowsky,et al. Rule Writing or Annotation: Cost-efficient Resource Usage for Base Noun Phrase Chunking , 2000, ACL.

[4] Hiroyuki Shinnou. Deterministic Japanese Word Segmentation by Decision List Method , 2000, PRICAI.

[5] Yuji Matsumoto,et al. Japanese Dependency Structure Analysis Based on Support Vector Machines , 2000, EMNLP.

[6] Yuji Matsumoto,et al. Chunking with Support Vector Machines , 2001, NAACL.

[7] Yuji Matsumoto,et al. Use of Support Vector Learning for Chunk Identification , 2000, CoNLL/LLL.

[8] David Yarowsky,et al. Minimally Supervised Morphological Analysis by Multimodal Alignment , 2000, ACL.

[9] David Yarowsky,et al. Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[10] John C. Platt,et al. Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[11] William A. Gale,et al. A sequential algorithm for training text classifiers , 1994, SIGIR '94.