Special issue on statistical learning of natural language structured input and output

During last decade, machine learning and, in particular, statistical approaches have become more and more important for research in Natural Language Processing (NLP) and Computational Linguistics. Nowadays, most stakeholders of the field use machine learning, as it can significantly enhance both system design and performance. However, machine learning requires careful parameter tuning and feature engineering for representing language phenomena. The latter becomes more complex when the system input/output data is structured, since the designer has both to (i) engineer features for representing structure and model interdependent layers of information, which is usually a non-trivial task; and (ii) generate a structured output using classifiers, which, in their original form, were developed only for classification or regression. Research in empirical NLP has been tackling this problem by constructing output structures as a combination of the predictions of independent local classifiers, eventually applying post-processing heuristics to correct incompatible outputs by enforcing global properties. More recently, some advances of the statistical learning theory, namely structured output spaces and kernel methods, have brought techniques for directly encoding dependencies between data items in a learning algorithm that performs global optimization. Within this framework, this special issue aims at studying, comparing, and reconciling the typical domain/task-specific NLP approaches to structured data with the most advanced machine learning methods. In particular, the selected papers analyze the use of diverse structured input/output approaches, ranging from re-ranking to joint constraint-based global models, for diverse natural language tasks, i.e., document ranking, syntactic parsing, sequence supertagging, and relation extraction between terms and entities. Overall, the experience with this special issue shows that, although a definitive unifying theory for encoding and generating structured information in NLP applications is still far from being shaped, some interesting and effective best practice can be defined to guide practitioners in modeling their own natural language application on complex data.

[1]  Ming-Wei Chang,et al.  Learning and Inference with Constraints , 2008, AAAI.

[2]  Xavier Carreras,et al.  Proceedings of the Thirteenth Conference on Computational Natural Language Learning, CoNLL 2009, Boulder, Colorado, USA, June 4-5, 2009 , 2009, CoNLL.

[3]  Dan Roth,et al.  Learning and Inference over Constrained Output , 2005, IJCAI.

[4]  Alessandro Moschitti,et al.  Convolution Kernels on Constituent, Dependency and Sequential Structures for Relation Extraction , 2009, EMNLP.

[5]  Anoop Sarkar,et al.  Discriminative Reranking for Machine Translation , 2004, NAACL.

[6]  Michael Collins,et al.  Discriminative Reranking for Natural Language Parsing , 2000, CL.

[7]  Michael Collins,et al.  Hidden-Variable Models for Discriminative Reranking , 2005, HLT.

[8]  David Yarowsky,et al.  Word-Sense Disambiguation Using Statistical Models of Roget’s Categories Trained on Large Corpora , 2010, COLING.

[9]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[10]  Michael Collins,et al.  New Ranking Algorithms for Parsing and Tagging: Kernels over Discrete Structures, and the Voted Perceptron , 2002, ACL.

[11]  Mark Johnson,et al.  Using Universal Linguistic Knowledge to Guide Grammar Induction , 2010, EMNLP.

[12]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[13]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[14]  Mirella Lapata,et al.  Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, EMNLP 2009, 6-7 August 2009, Singapore, A meeting of SIGDAT, a Special Interest Group of the ACL , 2009, EMNLP.

[15]  Roberto Basili,et al.  Exploiting Syntactic and Shallow Semantic Kernels for Question Answer Classification , 2007, ACL.

[16]  Mary P. Harper,et al.  Reranking for Sentence Boundary Detection in Conversational Speech , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[17]  Eric Brill,et al.  A Simple Rule-Based Part of Speech Tagger , 1992, HLT.

[18]  Mirella Lapata,et al.  Proceedings of ACL-08: HLT , 2008 .

[19]  Alessandro Moschitti,et al.  Re-Ranking Models Based-on Small Training Data for Spoken Language Understanding , 2009, EMNLP.

[20]  Raymond J. Mooney,et al.  Discriminative Reranking for Semantic Parsing , 2006, ACL.

[21]  Eugene Charniak,et al.  Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[22]  Nitin Madnani,et al.  Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing , 2005 .

[23]  Dan Roth,et al.  The Importance of Syntactic Parsing and Inference in Semantic Role Labeling , 2008, CL.

[24]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2004 .

[25]  Liang Huang,et al.  Forest Reranking: Discriminative Parsing with Non-Local Features , 2008, ACL.

[26]  Xavier Carreras,et al.  Introduction to the CoNLL-2005 Shared Task: Semantic Role Labeling , 2005, CoNLL.

[27]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[28]  Taku Kudo,et al.  Boosting-based Parse Reranking with Subtree Features , 2005, ACL.