Machine Learning in Natural Language Processing

This thesis examines the use of machine learning techniques in various tasks of natural language processing, mainly for the task of information extraction from texts. The objectives are the improvement of adaptability of information extraction systems to new thematic domains (or even languages), and the improvement of their performance using as fewer resources (either linguistic or human) as possible. This thesis has examined two main axes: a) the research and assessment of existing algorithms of machine learning mainly in the stages of linguistic pre-processing (such as part of speech tagging) and named-entity recognition, and b) the creation of a new machine learning algorithm and its assessment on synthetic data, as well as in real world data for the task of relation extraction between named entities. This new algorithm belongs to the category of inductive grammar learning, and can infer context free grammars from only positive examples.

[1]  Hans Uszkoreit,et al.  A Seed-driven Bottom-up Machine Learning Framework for Extracting Relations of Various Complexity , 2007, ACL.

[2]  Georgios Paliouras,et al.  Using Machine Learning to Maintain Rule-based Named-Entity Recognition and Classification Systems , 2001, ACL.

[3]  Menno van Zaanen,et al.  The Omphalos Context-Free Grammar Learning Competition , 2004, ICGI.

[4]  Dimitris Christodoulakis,et al.  POS Disambiguation and Unknown Word Guessing with Decision Trees , 1999, EACL.

[5]  Roger C. Schank,et al.  Scripts, plans, goals and understanding: an inquiry into human knowledge structures , 1978 .

[6]  Gerald DeJong,et al.  Conceptual information retrieval , 1980, SIGIR '80.

[7]  Stelios Piperidis,et al.  A Unified POS Tagging Architecture and its Application to Greek , 2000, LREC.

[8]  Georgios Paliouras,et al.  Symbolic and Neural Learning of Named-Entity Recognition and Classification Systems in Two Languages , 2002, Advances in Computational Intelligence and Learning.

[9]  Elaine Marsh,et al.  MUC-7 Evaluation of IE Technology: Overview of Results , 1998, MUC.

[10]  George K. Kokkinakis,et al.  Automatic Stochastic Tagging of Natural Language Texts , 1995, Comput. Linguistics.

[11]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[12]  Constantine D. Spyropoulos,et al.  A Greek Morphological Lexicon and Its Exploitation by Natural Language Processing Applications , 2001, Panhellenic Conference on Informatics.

[13]  Stavros J. Perantonis,et al.  A Learning Framework for Neural Networks Using Constrained Optimization Methods , 2000, Ann. Oper. Res..

[14]  J. Rissanen Stochastic Complexity in Statistical Inquiry Theory , 1989 .

[15]  James R. Curran,et al.  Language Independent NER using a Maximum Entropy Tagger , 2003, CoNLL.

[16]  Georgios Paliouras,et al.  e-GRIDS: Computationally Efficient Gramatical Inference from Positive Examples , 2004, Grammars.

[17]  Pat Langley,et al.  Learning Context-Free Grammars with a Simplicity Bias , 2000, ECML.

[18]  Eric Brill,et al.  Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging , 1995, CL.

[19]  James R. Curran,et al.  Investigating GIS and Smoothing for Maximum Entropy Taggers , 2003, EACL.

[20]  Georgios Paliouras,et al.  eg-GRIDS: Context-Free Grammatical Inference from Positive Examples Using Genetic Search , 2004, ICGI.

[21]  Georgios Paliouras,et al.  Resolving Part-of-Speech Ambiguity in the Greek Language Using Learning Techniques , 1999, ArXiv.

[22]  G. Petasis,et al.  Prosodically Enriched Text Annotation for High Quality Speech Synthesis , 2005 .

[23]  Georgios Paliouras,et al.  Symbolic and Neural Learning for Named-Entity Recognition , 2000 .