Information extraction from calls for papers with conditional random fields and layout features

For members of the research community it is vital to stay informed about conferences, workshops, and other research meetings relevant to their field. These events are typically announced in calls for papers (CFPs) that are distributed via mailing lists. We employ Conditional Random Fields for the task of extracting key information such as conference names, titles, dates, locations and submission deadlines from CFPs. Extracting this information from CFPs automatically has applications in building automated conference calendars and search engines for CFPs. We combine a variety of features, including generic token classes, domain-specific dictionaries and layout features. Layout features prove particularly useful in the absence of grammatical structure, improving average F1 by 30% in our experiments.

[1]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[2]  Burr Settles,et al.  Biomedical Named Entity Recognition using Conditional Random Fields and Rich Feature Sets , 2004, NLPBA/BioNLP.

[3]  W. Bruce Croft,et al.  Table extraction using conditional random fields , 2003, DG.O.

[4]  John D. Lafferty,et al.  Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  William W. Cohen,et al.  Learning to Extract Signature and Reply Lines from Email , 2004, CEAS.

[6]  Ellen Riloff,et al.  Learning Dictionaries for Information Extraction by Multi-Level Bootstrapping , 1999, AAAI/IAAI.

[7]  Rob Malouf,et al.  A Comparison of Algorithms for Maximum Entropy Parameter Estimation , 2002, CoNLL.

[8]  Matthew Hurst,et al.  Layout and Language: Integrating Spatial and Linguistic Knowledge for Layout Understanding Tasks , 2000, COLING.

[9]  Christopher D. Manning,et al.  Template Sampling for Leveraging Domain Knowledge in Information Extraction , 2005 .

[10]  Mitchell P. Marcus,et al.  Text Chunking using Transformation-Based Learning , 1995, VLC@ACL.

[11]  Ralph Grishman,et al.  Bootstrapped Learning of Semantic Classes from Positive and Negative Examples , 2003 .

[12]  Fernando Pereira,et al.  Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[13]  Andrew McCallum,et al.  Maximum Entropy Markov Models for Information Extraction and Segmentation , 2000, ICML.

[14]  Andrew McCallum,et al.  Accurate Information Extraction from Research Papers using Conditional Random Fields , 2004, NAACL.