论文信息 - The Semi-Automatic Construction of Part-Of-Speech Taggers for Specific Languages by Statistical Methods

The Semi-Automatic Construction of Part-Of-Speech Taggers for Specific Languages by Statistical Methods

Economic activities now keep being globalized more and more. Thus we are driven to deal with not only the documents written in English but also those written in other languages. In order to enable us to develop processors of any language quickly, we have been making a framework based on statistical processing and machine learning. At present, we confirmed that part-of-speech (POS) taggers of some target languages can be built by using this framework and the information of source languages. In this paper, we describe the method of acquiring POS lexicons and that of generating supervisors of POS sequences, which are used to learn grammatical models of target languages. We also explain the experimental results of building POS taggers of Portuguese and Indonesian by using some source languages.

Hiromi Wakaki | Tomohiro Yamasaki | Masaru Suzuki

[1] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[2] John A. Goldsmith,et al. Unsupervised Learning of the Morphology of a Natural Language , 2001, CL.

[3] Cheng Niu,et al. A Bootstrapping Approach to Named Entity Classification Using Successive Learners , 2003, ACL.

[4] Sophia Ananiadou,et al. Extracting Nested Collocations , 1996, COLING.

[5] Naonori Ueda,et al. Bayesian Unsupervised Word Segmentation with Nested Pitman-Yor Language Modeling , 2009, ACL.