论文信息 - Multi-Field Information Extraction and Cross-Document Fusion

Multi-Field Information Extraction and Cross-Document Fusion

In this paper, we examine the task of extracting a set of biographic facts about target individuals from a collection of Web pages. We automatically annotate training text with positive and negative examples of fact extractions and train Rote, Naive Bayes, and Conditional Random Field extraction models for fact extraction from individual Web pages. We then propose and evaluate methods for fusing the extracted information across documents to return a consensus answer. A novel cross-field bootstrapping method leverages data interdependencies to yield improved performance.

David Yarowsky | Gideon S. Mann

[1] Bonnie Webber,et al. Information Fusion for Answering Factoid Questions , 2003 .

[2] Andrew McCallum,et al. Information Extraction with HMMs and Shrinkage , 1999 .

[3] Ellen Riloff,et al. An Empirical Approach to Conceptual Case Frame Acquisition , 1998, VLC@COLING/ACL.

[4] David Fisher,et al. CRYSTAL: Inducing a Conceptual Dictionary , 1995, IJCAI.

[5] Dragomir R. Radev,et al. Generating Natural Language Summaries from Multiple On-Line Sources , 1998, CL.

[6] Kazem Taghva,et al. Address extraction using hidden Markov models , 2005, IS&T/SPIE Electronic Imaging.

[7] Roni Rosenfeld,et al. Learning Hidden Markov Model Structure for Information Extraction , 1999 .

[8] Charles L. A. Clarke,et al. Exploiting redundancy in question answering , 2001, SIGIR '01.

[9] Andrew McCallum,et al. Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data , 2004, J. Mach. Learn. Res..

[10] Tim Leek,et al. Information Extraction Using Hidden Markov Models , 1997 .

[11] Claire Cardie,et al. Multidocument Summarization via Information Extraction , 2001, HLT.