Impacts of Features and Tagging Schemes on Chunking

Text chunking, also known as shallow parsing, is an important task in natural language processing, and very useful for other tasks. By means of discriminate machine learning methods and extensive experiments, this paper investigates the impacts of different tagging schemes and feature types on chunking efficiency and effectiveness on corpora with different chunk specifications and languages. We find out that it costs more time for training and tagging with the machine learning method with more features and more fine-grained tagging schemes on all the corpora. Nevertheless, the tagging time is less affected by them. It is also revealed from our investigation that the method with more features and more fine-grained tagging schemes has better performance, but the chunk specification of corpus may have impacts on the choice.

[1]  Yuji Matsumoto,et al.  Chunking with Support Vector Machines , 2001, NAACL.

[2]  Jingtao Yao,et al.  Chunk-based Decoder for Neural Machine Translation , 2017, ACL.

[3]  Jian Su,et al.  Named Entity Recognition using an HMM-based Chunk Tagger , 2002, ACL.

[4]  Heng Li,et al.  Transductive HMM based Chinese text chunking , 2003, International Conference on Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003.

[5]  Richard Johansson,et al.  Mining Fine-grained Opinion Expressions with Shallow Parsing , 2013, RANLP.

[6]  Daniel Jurafsky,et al.  Semantic Role Labeling by Tagging Syntactic Chunks , 2004, CoNLL.

[7]  Mitchell P. Marcus,et al.  Text Chunking using Transformation-Based Learning , 1995, VLC@ACL.

[8]  Ryo Nagata,et al.  Analyzing the Impact of Spelling Errors on POS-Tagging and Chunking in Learner English , 2017, NLP-TEA@IJCNLP.

[9]  Yin Li,et al.  The Construction of A Chinese Shallow Treebank , 2004, SIGHAN@ACL.

[10]  Erik F. Tjong Kim Sang,et al.  Memory-Based Shallow Parsing , 2002, J. Mach. Learn. Res..

[11]  Fernando Pereira,et al.  Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[12]  Hitoshi Isahara,et al.  An Empirical Study of Chinese Chunking , 2006, ACL.

[13]  Yi-Chun Chen,et al.  Zero Anaphora Resolution in Chinese with Shallow Parsing , 2007, J. Chin. Lang. Comput..

[14]  Ralph Grishman,et al.  Jargon-Term Extraction by Chunking , 2014, COLING 2014.

[15]  Shih-Hung Wu,et al.  Applying Maximum Entropy to Robust Chinese Shallow Parsing , 2005, ROCLING/IJCLCLP.