Probabilistic Chinese word segmentation with non-local information and stochastic training

In this article, we focus on Chinese word segmentation by systematically incorporating non-local information based on latent variables and word-level features. Differing from previous work which captures non-local information by using semi-Markov models, we propose an alternative method for modeling non-local information: a latent variable word segmenter employing word-level features. In order to reduce computational complexity of learning non-local information, we further present an improved online training method, which can arrive the same objective optimum with a significantly accelerated training speed. We find that the proposed method can help the learning of long range dependencies and improve the segmentation quality of long words (for example, complicated named entities). Experimental results demonstrate that the proposed method is effective. With this improvement, evaluations on the data of the second SIGHAN CWS bakeoff show that our system is competitive with the state-of-the-art systems.

[1]  Galen Andrew,et al.  A Hybrid Markov/Semi-Markov Conditional Random Field for Sequence Segmentation , 2006, EMNLP.

[2]  Xu Sun,et al.  A Discriminative Latent Variable Chinese Segmenter with Hybrid Word/Character Information , 2009, HLT-NAACL.

[3]  Stephen J. Wright,et al.  Numerical Optimization , 2018, Fundamental Statistical Inference.

[4]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[5]  Nianwen Xue,et al.  Chinese Word Segmentation as Character Tagging , 2003, ROCLING/IJCLCLP.

[6]  William W. Cohen,et al.  Semi-Markov Conditional Random Fields for Information Extraction , 2004, NIPS.

[7]  Stephen Clark,et al.  Chinese Segmentation with a Word-Based Perceptron Algorithm , 2007, ACL.

[8]  Daniel Jurafsky,et al.  A Conditional Random Field Word Segmenter for Sighan Bakeoff 2005 , 2005, IJCNLP.

[9]  Peter L. Bartlett,et al.  Exponentiated Gradient Algorithms for Conditional Random Fields and Max-Margin Markov Networks , 2008, J. Mach. Learn. Res..

[10]  Trevor Darrell,et al.  Latent-Dynamic Discriminative Models for Continuous Gesture Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Hai Zhao,et al.  A Unified Character-Based Tagging Framework for Chinese Word Segmentation , 2010, TALIP.

[12]  Stanley F. Chen,et al.  A Gaussian Prior for Smoothing Maximum Entropy Models , 1999 .

[13]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[14]  Jianfeng Gao,et al.  A Comparative Study of Parameter Estimation Methods for Statistical Natural Language Processing , 2007, ACL.

[15]  Thomas Emerson,et al.  The Second International Chinese Word Segmentation Bakeoff , 2005, IJCNLP.

[16]  Xu Sun,et al.  Modeling Latent-Dynamic in Shallow Parsing: A Latent Conditional Model with Improved Inference , 2008, COLING.

[17]  Xu Sun,et al.  Latent Variable Perceptron Algorithm for Structured Classification , 2009, IJCAI.

[18]  Trevor Darrell,et al.  Hidden Conditional Random Fields , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Hai Zhao,et al.  Integrating unsupervised and supervised word segmentation: The role of goodness measures , 2011, Inf. Sci..

[20]  Sophia Ananiadou,et al.  Stochastic Gradient Descent Training for L1-regularized Log-linear Models with Cumulative Penalty , 2009, ACL.

[21]  Eiichiro Sumita,et al.  Subword-based Tagging by Conditional Random Fields for Chinese Word Segmentation , 2006, NAACL.

[22]  Xu Sun,et al.  Robust Approach to Abbreviating Terms: A Discriminative Latent Variable Model with Global Information , 2009, ACL/IJCNLP.

[23]  Xu Sun,et al.  Sequential Labeling with Latent Variables: An Exact Inference Algorithm and its Efficient Approximation , 2009, EACL.

[24]  Yuji Matsumoto,et al.  Combination of Machine Learning Methods for Optimum Chinese Word Segmentation , 2005, IJCNLP.

[25]  Xu Sun,et al.  Fast Online Training with Frequency-Adaptive Learning Rates for Chinese Word Segmentation and New Word Detection , 2012, ACL.

[26]  Percy Liang,et al.  Semi-Supervised Learning for Natural Language , 2005 .

[27]  Aitao Chen,et al.  Unigram Language Model for Chinese Word Segmentation , 2005, SIGHAN@IJCNLP 2005.

[28]  Dan Klein,et al.  Discriminative Log-Linear Grammars with Latent Variables , 2007, NIPS.

[29]  Andrew McCallum,et al.  Chinese Segmentation and New Word Detection using Conditional Random Fields , 2004, COLING.