论文信息 - Improving Successor Variety for Morphological Segmentation

Improving Successor Variety for Morphological Segmentation

Successor variety is a commonly used measure for segmentation in language processing. It is based on a simple idea that a large variety of letters (or phonemes) following an initial word (or utterance) segment indicates a possible boundary. It dates back to Harris (1955), and several methods based on successor variety have been used in the literature, particularly for the purpose of segmenting words into morphemes. However, there have not been many studies analyzing the measure itself. Even though the idea is simple and effective, the current use in the literature does not utilize the measure to its full extent due to a number of problems with the successor variety scores. This paper intends to address these problems by introducing a normalization method, and demonstrates—using segmentation experiments on two typologically different languages— the effectiveness of this improvement on the morphological segmentation task.

Çagri Çöltekin

[1] Stefan Bordag,et al. Unsupervised Knowledge-Free Morpheme Boundary Detection , 2005 .

[2] Çağrı Çöltekin,et al. A Freely Available Morphological Analyzer for Turkish , 2010, LREC.

[3] CohenPaul,et al. Voting experts: An unsupervised algorithm for segmenting sequences , 2007 .

[4] Mathias Creutz,et al. Unsupervised models for morpheme segmentation and morphology learning , 2007, TSLP.

[5] Stefan Bordag. Unsupervised and Knowledge-free Morpheme Segmentation and Analysis , 2007, CLEF.

[6] M. Goldsmith,et al. Statistical Learning by 8-Month-Old Infants , 1996 .

[7] Michael R. Brent,et al. An Efficient, Probabilistically Sound Algorithm for Segmentation and Word Discovery , 1999, Machine Learning.

[8] Paul R. Cohen,et al. Voting experts: An unsupervised algorithm for segmenting sequences , 2007, Intell. Data Anal..

[9] Mathias Creutz,et al. Morpheme Segmentation Gold Standards for Finnish and English , 2004 .

[10] Vera Demberg,et al. A Language-Independent Unsupervised Model for Morphological Segmentation , 2007, ACL.

[11] John Goldsmith,et al. An algorithm for the unsupervised learning of morphology , 2006, Natural Language Engineering.