Chinese Terminology Extraction Using EM-Based Transfer Learning Method

As an important part of information extraction, terminology extraction attracts more attention. Currently, statistical and rule-based methods are used to extract terminologies in a specific domain. However, cross-domain terminology extraction task has not been well addressed yet. In this paper we propose using EM-based transfer learning method for cross-domain Chinese terminology extraction. Firstly, a naive bayes model is learned from source domain. Then EM-based transfer learning algorithm is used to adapt the classifier learnt from source domain to target domain, which is in different data distribution and domain from source domain. The advantage of our proposed method is to enable the target domain to utilize the knowledge from the source domain. Experimental results between computer domain and environment domain show the proposed Chinese terminology extraction with EM-based transfer learning method outperforms traditional statistical terminology extraction method significantly.

[1]  Carl Gutwin,et al.  KEA: practical automatic keyphrase extraction , 1999, DL '99.

[2]  John Blitzer,et al.  Domain Adaptation with Structural Correspondence Learning , 2006, EMNLP.

[3]  Takahiro Hara,et al.  An Approach for Extracting Bilingual Terminology from Wikipedia , 2008, DASFAA.

[4]  Sophia Ananiadou,et al.  Identifying contextual information for multi-word term extraction , 1999 .

[5]  Nicu Sebe,et al.  Special section from the ACM multimedia conference 2007 , 2008, TOMCCAP.

[6]  Alexander Gelbukh,et al.  Computational Linguistics and Intelligent Text Processing , 2015, Lecture Notes in Computer Science.

[7]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[8]  Qiang Yang,et al.  Topic-bridged PLSA for cross-domain text classification , 2008, SIGIR '08.

[9]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[10]  Qiang Yang,et al.  Boosting for transfer learning , 2007, ICML '07.

[11]  Tiejun Zhao,et al.  Chinese Term Extraction Using Minimal Resources , 2008, COLING.

[12]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[13]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[14]  Slava M. Katz,et al.  Technical terminology: some linguistic properties and an algorithm for identification in text , 1995, Natural Language Engineering.

[15]  Qin Lu,et al.  Chinese Terminology Extraction Using Window-Based Contextual Information , 2009, CICLing.

[16]  Qiang Yang,et al.  Transferring Naive Bayes Classifiers for Text Classification , 2007, AAAI.

[17]  Serge Sharoff,et al.  9. Lexicography, terminology and ontologies , 2012 .

[18]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[19]  Zhang Feng Chinese Term Extraction System Based on Mutual Information , 2005 .

[20]  Hideki Mima,et al.  Automatic recognition of multi-word terms:. the C-value/NC-value method , 2000, International Journal on Digital Libraries.

[21]  Jun'ichi Tsujii,et al.  Training a Naive Bayes Classifier via the EM Algorithm with a Class Distribution Constraint , 2003, CoNLL.

[22]  Ziqi Zhang,et al.  A Comparative Evaluation of Term Recognition Algorithms , 2008, LREC.

[23]  Mirella Lapata,et al.  Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008) , 2008 .

[24]  Haizhou Li,et al.  EM-based Hybrid Model for Bilingual Terminology Extraction from Comparable Corpora , 2010, COLING.

[25]  Takahiro Hara,et al.  Improving the extraction of bilingual terminology from Wikipedia , 2009, TOMCCAP.