Shallow Parsing for South Asian Languages

As part of the IJCAI workshop on ”Shallow Parsing for South Asian Languages”, a contest was held in which the participants trained and tested their shallow parsing systems for Hindi, Bengali and Telugu. This paper gives the complete account of the contest in terms of how the data for the three languages was released, the performances of the participating systems and an overview of the approaches followed for POS tagging and chunking. We finally give an analysis of the systems which gives insights to directions for future research on shallow parsing for South Asian languages.

[1]  John D. Lafferty,et al.  Decision Tree Models Applied to the Labeling of Text with Parts-of-Speech , 1992, HLT.

[2]  Tong Zhang,et al.  Text Chunking based on a Generalization of Winnow , 2002, J. Mach. Learn. Res..

[3]  Steven P. Abney Partial parsing via finite-state cascades , 1996, Natural Language Engineering.

[4]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[5]  Mitchell P. Marcus,et al.  Maximum entropy models for natural language ambiguity resolution , 1998 .

[6]  Eric Brill,et al.  A corpus-based approach to language learning , 1993 .

[7]  Fernando Pereira,et al.  Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[8]  Eric Brill,et al.  A Simple Rule-Based Part of Speech Tagger , 1992, HLT.

[9]  Bernard Mérialdo,et al.  Tagging English Text with a Probabilistic Model , 1994, CL.

[10]  Christopher D. Manning,et al.  Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger , 2000, EMNLP.

[11]  李幼升,et al.  Ph , 1989 .

[12]  Yuji Matsumoto,et al.  Chunking with Support Vector Machines , 2001, NAACL.

[13]  Balaraman Ravindran,et al.  Part Of Speech Tagging and Chunking with HMM and CRF , 2006 .

[14]  Pushpak Bhattacharyya,et al.  Morphological Richness Offsets Resource Demand - Experiences in Constructing a POS Tagger for Hindi , 2006, ACL.

[15]  Andrew McCallum,et al.  An Introduction to Conditional Random Fields for Relational Learning , 2007 .

[16]  Kenneth Ward Church A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text , 1988, ANLP.

[17]  Rajeev Sangal,et al.  HMM Based Chunker for Hindi , 2005, IJCNLP.

[18]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Part-Of-Speech Tagging , 1996, EMNLP.

[19]  W Stolz,et al.  A Probabilistic Procedure for Grouping Words into Phrases , 1965, Language and speech.

[20]  Mitchell P. Marcus,et al.  Text Chunking using Transformation-Based Learning , 1995, VLC@ACL.

[21]  Penelope Sibun,et al.  A Practical Part-of-Speech Tagger , 1992, ANLP.

[22]  Van Nostrand,et al.  Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm , 1967 .

[23]  Eugene Charniak,et al.  Statistical language learning , 1997 .

[24]  Paul Taylor,et al.  Assigning phrase breaks from part-of-speech sequences , 1997, Comput. Speech Lang..

[25]  Thorsten Brants,et al.  TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[26]  Steven Abney,et al.  Parsing By Chunks , 1991 .

[27]  Jun'ichi Tsujii,et al.  A Maximum Entropy Tagger with Unsupervised Hidden Markov Models , 2001, NLPRS.

[28]  Dan Tufis,et al.  Tagging romanian texts: a case study for QTAG, a language independent probabilistic tagger , 1998 .

[29]  Eric Brill,et al.  Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging , 1995, CL.

[30]  Anirudh Mani,et al.  Part of Speech Tagging and Chunking with Conditional Random Fields , 2022 .

[31]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[32]  TaylorPaul,et al.  Assigning phrase breaks from part-of-speech sequences , 1998 .

[33]  Eric Brill,et al.  Some Advances in Transformation-Based Part of Speech Tagging , 1994, AAAI.

[34]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.