Automatic Extension of Semantic Lexicons with a Bootstrapping Algorithm: Using Corpora to Learn Semantic Features

Where is the information I need in this huge pile of data? This question will be asked more and more in our information society and the only way to solve it on a large scale is to process the data automatically. Since most often this data is unstructured and available only in natural language, we need to understand the inner structure of natural languages and employ the tools of computer science to effectively extract the information we need. For this task to be successful a good lexicon is needed. Many fields of advanced natural language processing, such as information retrieval, word sense disambiguation or semantic web applications are based on lexical information. Because the manual creation of high quality lexical resources is very expensive and time consuming, there is a need for automatic or semi-automatic tools to create these. This book investigates and extends a bootstrapping approach which permits to extend high quality lexical resources with the help of very large corpora. It is directed tow ards researchers or technical staff interested in either automatically creating a new lexicon or extending their general or domain specific lexicons to new domains.

[1]  S. A. Sherman,et al.  Providence , 1906 .

[2]  青木 昌吉 Deutsche Grammatik = 獨逸小文典 , 1921 .

[3]  I. G. BONNER CLAPPISON Editor , 1960, The Electric Power Engineering Handbook - Five Volume Set.

[4]  Jeffrey Gruber Studies in lexical relations , 1965 .

[5]  H. Kucera,et al.  Computational analysis of present-day American English , 1967 .

[6]  Zellig S. Harris,et al.  Mathematical structures of language , 1968, Interscience tracts in pure and applied mathematics.

[7]  Verzekeren Naar Sparen,et al.  Cambridge , 1969, Humphrey Burton: In My Own Time.

[8]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[9]  Editors , 1986, Brain Research Bulletin.

[10]  R. Jackendoff The Status of Thematic Relations in Linguistic Theory , 1987 .

[11]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[12]  R. Lathe Phd by thesis , 1988, Nature.

[13]  G. Miller,et al.  Contextual correlates of semantic similarity , 1991 .

[14]  David R. Dowty Thematic proto-roles and argument selection , 1991 .

[15]  James Pustejovsky,et al.  The Generative Lexicon , 1995, CL.

[16]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[17]  Ivan A. Sag,et al.  Book Reviews: Head-driven Phrase Structure Grammar and German in Head-driven Phrase-structure Grammar , 1996, CL.

[18]  Hagen Langer,et al.  Parsing - eine Einführung in die maschinelle Analyse natürlicher Sprache , 1994, Leitfäden und Monographien der Informatik.

[19]  Ian H. Witten,et al.  Learning language using genetic algorithms , 1995, Learning for Natural Language Processing.

[20]  M. Ziegler Volume 152 of Graduate Texts in Mathematics , 1995 .

[21]  H. Selbmann,et al.  Learning to recognize objects , 1999, Trends in Cognitive Sciences.

[22]  Ulrich Heid,et al.  Extracting raw material for a German subcategorization lexicon from newspaper text , 1996 .

[23]  Keh-Yih Su,et al.  An Overview of Corpus-Based Statistics-Oriented (CBSO) Techniques for Natural Language Processing , 1996, ROCLING/IJCLCLP.

[24]  Hermann Helbig,et al.  Word Class Functions for Syntactic-Semantic Analysis , 1997 .

[25]  Ellen Riloff,et al.  A Corpus-Based Approach for Building Semantic Lexicons , 1997, EMNLP.

[26]  Brian Roark,et al.  Noun-Phrase Co-Occurence Statistics for Semi-Automatic Semantic Lexicon Construction , 1998, COLING-ACL.

[27]  Alice M. Obenchain-Leeson,et al.  Volume 6 , 1998 .

[28]  Marion Schulz Eine Werkbank zur interaktiven Erstellung semantikbasierter Computerlexika , 1998 .

[29]  Ellen Riloff,et al.  Learning Dictionaries for Information Extraction by Multi-Level Bootstrapping , 1999, AAAI/IAAI.

[30]  Wolfgang Wahlster,et al.  Verbmobil: Foundations of Speech-to-Speech Translation , 2000, Artificial Intelligence.

[31]  Luis Gravano,et al.  Snowball: extracting relations from large plain-text collections , 2000, DL '00.

[32]  Nicoletta Calzolari,et al.  SIMPLE: A General Framework for the Development of Multilingual Lexicons , 2000, LREC.

[33]  Claudia Kunze,et al.  Anwendungsperspektiven des GermaNet, eines lexikalisch-semantischen Netzes für das Deutsche , 2001 .

[34]  Sonja Müller-Landmann Wissen über Wörter - Die Mikrostruktur als DTD , 2001, GLDV-Jahrestagung.

[35]  Hermann Helbig Die semantische Struktur natürlicher Sprache , 2001 .

[36]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[37]  Ulrich Heid,et al.  Providing lexicographers with corpus evidence for fine-grained syntactic descriptions: adjectives taking subject and complement clauses , 2002 .

[38]  Hiroaki Sato,et al.  The FrameNet Database and Software Tools , 2002, LREC.

[39]  Ellen Riloff,et al.  A Bootstrapping Method for Learning Semantic Lexicons using Extraction Pattern Contexts , 2002, EMNLP.

[40]  Christian Wolff,et al.  Named Entity Learning and Verification: EM in Large Corpora , 2002 .

[41]  Sven Hartrumpf,et al.  The semantically based computer lexicon HaGenLex. Structure and technological environment , 2003 .

[42]  Geoffrey Sampson,et al.  The Oxford Handbook of Computational Linguistics , 2003, Lit. Linguistic Comput..

[43]  C. Habel,et al.  Language , 1931, NeuroImage.

[44]  Sven Hartrumpf,et al.  Hybrid disambiguation in natural language analysis , 2003 .

[45]  Manfred Pinkal,et al.  Towards a Resource for Lexical Semantics: A Large German Corpus with Extensive Semantic Annotation , 2003, ACL.

[46]  Mykel J. Kochenderfer,et al.  AN EVOLUTIONARY APPROACH TO NATURAL LANGUAGE GRAMMAR INDUCTION , 2003 .

[47]  Christian Biemann,et al.  Language-Independent Methods for Compiling Monolingual Lexical Data , 2004, CICLing.

[48]  Robert C. Moore On Log-Likelihood-Ratios and the Significance of Rare Events , 2004, EMNLP.

[49]  James Richard Curran,et al.  From distributional to semantic similarity , 2004 .

[50]  Sven Hartrumpf University of Hagen at QA@CLEF 2005: Extending Knowledge and Deepening Linguistic Processing for Question Answering , 2005, CLEF.

[51]  Mirella Lapata,et al.  Cross-Lingual Bootstrapping of Semantic Lexicons: The Case of FrameNet , 2005, AAAI.

[52]  Rainer Osswald,et al.  Automatische Erweiterung eines seman-tikbasierten Lexikons durch Bootstrapping auf gro?en Korpora , 2005 .

[53]  Christian Biemann,et al.  Corpus Portal for Search in Monolingual Corpora , 2006, LREC.

[54]  Yasuhiro Ogawa,et al.  Selection of Effective Contextual Information for Automatic Synonym Acquisition , 2006, ACL.

[55]  Leo Wanner,et al.  A Bootstrapping Approach to Automatic Annotation of Functional Information to Adjectives with an Application to German , 2007 .

[56]  Stefan Th. Gries,et al.  What is Corpus Linguistics? , 2009, Lang. Linguistics Compass.

[57]  Björn Gambäck,et al.  Evolutionary Algorithms in Natural Language Processing , 2010 .