Unsupervised Neural Word Segmentation for Chinese via Segmental Language Modeling

Previous traditional approaches to unsupervised Chinese word segmentation (CWS) can be roughly classified into discriminative and generative models. The former uses the carefully designed goodness measures for candidate segmentation, while the latter focuses on finding the optimal segmentation of the highest generative probability. However, while there exists a trivial way to extend the discriminative models into neural version by using neural language models, those of generative ones are non-trivial. In this paper, we propose the segmental language models (SLMs) for CWS. Our approach explicitly focuses on the segmental nature of Chinese, as well as preserves several properties of language models. In SLMs, a context encoder encodes the previous context and a segment decoder generates each segment incrementally. As far as we know, we are the first to propose a neural model for unsupervised CWS and achieve competitive performance to the state-of-the-art statistical models on four different datasets from SIGHAN 2005 bakeoff.

[1]  Chong Wang,et al.  Towards Neural Phrase-based Machine Translation , 2017, ICLR.

[2]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[3]  Prakhar Gupta,et al.  Learning Word Vectors for 157 Languages , 2018, LREC.

[4]  Naonori Ueda,et al.  Bayesian Unsupervised Word Segmentation with Nested Pitman-Yor Language Modeling , 2009, ACL.

[5]  Benoît Sagot,et al.  Unsupervized Word Segmentation: the Case for Mandarin Chinese , 2012, ACL.

[6]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[7]  Jason S. Chang,et al.  Unsupervised Word Segmentation Without Dictionary , 2003, ROCLING.

[8]  Benoît Sagot,et al.  Can MDL Improve Unsupervised Chinese Word Segmentation? , 2013, SIGHAN@IJCNLP.

[9]  T. Griffiths,et al.  A Bayesian framework for word segmentation: Exploring the effects of context , 2009, Cognition.

[10]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[11]  Xuanjing Huang,et al.  Long Short-Term Memory Neural Networks for Chinese Word Segmentation , 2015, EMNLP.

[12]  Weiwei Sun Word-based and Character-based Word Segmentation Models: Comparison and Combination , 2010, COLING.

[13]  Jian Zhu,et al.  A New Unsupervised Approach to Word Segmentation , 2011, CL.

[14]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[15]  Yee Whye Teh,et al.  Sharing Clusters among Related Groups: Hierarchical Dirichlet Processes , 2004, NIPS.

[16]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[17]  Baobao Chang,et al.  Max-Margin Tensor Neural Network for Chinese Word Segmentation , 2014, ACL.

[18]  Baobao Chang,et al.  A Joint Model for Unsupervised Chinese Word Segmentation , 2014, EMNLP.

[19]  Chong Wang,et al.  Sequence Modeling via Segmentations , 2017, ICML.

[20]  Haizhou Li,et al.  Chinese Word Segmentation , 1998, PACLIC.

[21]  Bo Xu,et al.  Convolutional Neural Network with Word Embeddings for Chinese Word Segmentation , 2017, IJCNLP.

[22]  Thomas Emerson,et al.  The Second International Chinese Word Segmentation Bakeoff , 2005, IJCNLP.

[24]  Lei Yu,et al.  Online Segment to Segment Neural Transduction , 2016, EMNLP.

[25]  Xuanjing Huang,et al.  Gated Recursive Neural Network for Chinese Word Segmentation , 2015, ACL.

[26]  Hai Zhao,et al.  An Empirical Comparison of Goodness Measures for Unsupervised Chinese Word Segmentation with a Unified Framework , 2008, IJCNLP.