Learning to Extract Entities from Labeled and Unlabeled Text

We describe and evaluate algorithms for learning to extract semantic classes from sentences in text documents, using the minimum of training information. The thesis of this research is that we can efficiently automate information extraction, that is, learn from tens of examples of labeled training data instead of requiring thousands, by exploiting redundancy and separability of the features noun-phrases and contexts. We exploit this redundancy and separability in two ways: (1) in the algorithms for learning semantic classes, and (2) in novel algorithms for active learning, leading to better extractors for a given amount of user labeling effort.

[1]  Wayne D. Gray,et al.  Basic objects in natural categories , 1976, Cognitive Psychology.

[2]  M. F.,et al.  Bibliography , 1985, Experimental Gerontology.

[3]  Andrew Radford,et al.  Transformational Grammar: A First Course , 1988 .

[4]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[5]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[6]  Daniel Dominic Sleator,et al.  Parsing English with a Link Grammar , 1995, IWPT.

[7]  Ellen Riloff,et al.  Automatically Constructing a Dictionary for Information Extraction Tasks , 1993, AAAI.

[8]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[9]  David R. Karger,et al.  Random sampling in cut, flow, and network design problems , 1994, STOC '94.

[10]  Scott B. Huffman,et al.  Learning information extraction patterns from examples , 1995, Learning for Natural Language Processing.

[11]  Ido Dagan,et al.  Syntax and lexical statistics in Anaphora , 1995, Appl. Artif. Intell..

[12]  Ralph Grishman,et al.  The NYU System for MUC-6 or Where’s the Syntax? , 1995, MUC.

[13]  Ellen Riloff,et al.  An Empirical Study of Automated Dictionary Construction for Information Extraction in Three Domains , 1996, Artif. Intell..

[14]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[15]  Ellen Riloff,et al.  Automatically Generating Extraction Patterns from Untagged Text , 1996, AAAI/IAAI, Vol. 2.

[16]  James C. Bezdek,et al.  Partially supervised clustering for image segmentation , 1996, Pattern Recognit..

[17]  Raymond J. Mooney,et al.  Relational Learning of Pattern-Match Rules for Information Extraction , 1999, CoNLL.

[18]  Claire Cardie,et al.  Empirical Methods in Information Extraction , 1997, AI Mag..

[19]  Prasad Tadepalli,et al.  Active Learning with Committees for Text Categorization , 1997, AAAI/IAAI.

[20]  Lillian Lee,et al.  Similarity-Based Approaches to Natural Language Processing , 1997, ArXiv.

[21]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[22]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[23]  Dayne Freitag,et al.  Multistrategy Learning for Information Extraction , 1998, ICML.

[24]  David D. Lewis,et al.  Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval , 1998, ECML.

[25]  Sebastian Thrun,et al.  Learning to Classify Text from Labeled and Unlabeled Documents , 1998, AAAI/IAAI.

[26]  Kamal Nigamyknigam,et al.  Employing Em in Pool-based Active Learning for Text Classiication , 1998 .

[27]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[28]  Andrew McCallum,et al.  Distributional clustering of words for text classification , 1998, SIGIR '98.

[29]  Tom M. Mitchell,et al.  Improving Text Classification by Shrinkage in a Hierarchy of Classes , 1998, ICML.

[30]  K. Minton Extraction Patterns for Information Extraction Tasks : A Survey , 1999 .

[31]  Oren Glickman,et al.  Examining Machine Learning for Adaptable End-to-End Information Extraction Systems , 1999 .

[32]  Yoram Singer,et al.  Unsupervised Models for Named Entity Classification , 1999, EMNLP.

[33]  Raymond J. Mooney,et al.  Active Learning for Natural Language Parsing and Information Extraction , 1999, ICML.

[34]  Ellen Riloff,et al.  Learning Dictionaries for Information Extraction by Multi-Level Bootstrapping , 1999, AAAI/IAAI.

[35]  Ellen Riloff Bootstrapping for text learning tasks , 1999 .

[36]  Mats Rooth,et al.  Inducing a Semantically Annotated Lexicon via EM-Based Clustering , 1999, ACL.

[37]  Douglas E. Appelt,et al.  Introduction to Information Extraction Technology , 1999, IJCAI 1999.

[38]  Roni Rosenfeld,et al.  Learning Hidden Markov Model Structure for Information Extraction , 1999 .

[39]  Craig A. Knoblock,et al.  Selective Sampling with Redundant Views , 2000, AAAI/IAAI.

[40]  Matthew Francis Hurst,et al.  The interpretation of tables in texts , 2000 .

[41]  Rayid Ghani,et al.  Analyzing the effectiveness and applicability of co-training , 2000, CIKM '00.

[42]  Alvaro E. Monge Matching Algorithms within a Duplicate Detection System , 2000, IEEE Data Engineering Bulletin.

[43]  Alessandro Vespignani,et al.  Epidemic spreading in scale-free networks. , 2000, Physical review letters.

[44]  S. Strogatz Exploring complex networks , 2001, Nature.

[45]  Ramon Ferrer i Cancho,et al.  The small world of human language , 2001, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[46]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[47]  Philip S. Yu,et al.  Partially Supervised Classification of Text Documents , 2002, ICML.

[48]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[49]  William W. Cohen,et al.  A flexible learning system for wrapping tables and lists in HTML documents , 2002, WWW.

[50]  Fabio Gagliardi Cozman,et al.  Unlabeled Data Can Degrade Classification Performance of Generative Classifiers , 2002, FLAIRS.

[51]  Thorsten Joachims,et al.  Learning to classify text using support vector machines - methods, theory and algorithms , 2002, The Kluwer international series in engineering and computer science.

[52]  Mark Craven,et al.  Exploiting Relations Among Concepts to Acquire Weakly Labeled Training Data , 2002, International Conference on Machine Learning.

[53]  S H Strogatz,et al.  Random graph models of social networks , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[54]  Ellen Riloff,et al.  A Bootstrapping Method for Learning Semantic Lexicons using Extraction Pattern Contexts , 2002, EMNLP.

[55]  Automatic training data collection for semi-supervised learning of information extraction systems , 2002 .

[56]  Jennifer Neville,et al.  Linkage and Autocorrelation Cause Feature Selection Bias in Relational Learning , 2002, ICML.

[57]  Mariano Sigman,et al.  Global organization of the Wordnet lexicon , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[58]  Dan Klein,et al.  Interpreting and Extending Classical Agglomerative Clustering Algorithms using a Model-Based approach , 2002, ICML.

[59]  William H. Press,et al.  Numerical recipes in C , 2002 .

[60]  Ying-Cheng Lai,et al.  Signatures of small-world and scale-free properties in large computer programs , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[61]  Luis Gravano,et al.  Modeling Query-Based Access to Text Databases , 2003, WebDB.

[62]  Rayid Ghani,et al.  Building Minority Language Corpora by Learning to Generate Web Search Queries , 2003, Knowledge and Information Systems.

[63]  M. Newman,et al.  Why social networks are different from other types of networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[64]  Thorsten Joachims,et al.  Transductive Learning via Spectral Graph Partitioning , 2003, ICML.

[65]  Peter Bailey,et al.  Engineering a multi-purpose test collection for Web retrieval experiments , 2003, Inf. Process. Manag..

[66]  H. Sebastian Seung,et al.  Selective Sampling Using the Query by Committee Algorithm , 1997, Machine Learning.

[67]  Andrew McCallum,et al.  Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data , 2004, J. Mach. Learn. Res..

[68]  Maria-Florina Balcan,et al.  Co-Training and Expansion: Towards Bridging Theory and Practice , 2004, NIPS.

[69]  William W. Cohen,et al.  Semi-Markov Conditional Random Fields for Information Extraction , 2004, NIPS.

[70]  Michel L. Goldstein,et al.  Problems with fitting to the power-law distribution , 2004, cond-mat/0402322.

[71]  Stephen Soderland,et al.  Learning Information Extraction Rules for Semi-Structured and Free Text , 1999, Machine Learning.

[72]  James Allan,et al.  Cross-Document Coreference on a Large Scale Corpus , 2004, NAACL.

[73]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[74]  Ido Dagan,et al.  Similarity-Based Models of Word Cooccurrence Probabilities , 1998, Machine Learning.

[75]  Ellen Riloff,et al.  Unsupervised Learning of Contextual Role Knowledge for Coreference Resolution , 2004, NAACL.

[76]  John D. Lafferty,et al.  Semi-supervised learning using randomized mincuts , 2004, ICML.

[77]  Luis Gravano,et al.  Extracting relations from large text collections , 2005 .

[78]  Joshua B. Tenenbaum,et al.  The Large-Scale Structure of Semantic Networks: Statistical Analyses and a Model of Semantic Growth , 2001, Cogn. Sci..

[79]  William W. Cohen,et al.  High-recall protein entity recognition using a dictionary , 2005, ISMB.

[80]  Hema Raghavan,et al.  InterActive Feature Selection , 2005, IJCAI.