论文信息 - Parsing and Maintaining Bibliographic References - Semi-supervised Learning of Conditional Random Fields with Constraints

Parsing and Maintaining Bibliographic References - Semi-supervised Learning of Conditional Random Fields with Constraints

This paper shows some key components of our workflow to cope with bibliographic information. We therefore compare several approaches for parsing bibliographic references using conditional random fields (CRFs). This paper concentrates on cases, where there are only few labeled training instances available. To get better labeling results prior knowledge about the bibliography domain is used in training CRFs using different constraint models. We show that our labeling approach is able to achieve comparable and even better results than other state of the art approaches. Afterwards we point out how for about half of our reference strings a correlation between journal title, volume and publishing year could be used to identify the correct journal even when we had ambiguous journal title abbreviations.

Sebastian Lindner | Winfried Höhn

[1] Edward A. Fox,et al. A hybrid two-stage approach for discipline-independent canonical representation extraction from references , 2012, JCDL '12.

[2] Jie Zou,et al. Locating and parsing bibliographic references in HTML medical articles , 2009, International Journal on Document Analysis and Recognition (IJDAR).

[3] Richard O. Duda,et al. Use of the Hough transformation to detect lines and curves in pictures , 1972, CACM.

[4] Andrew McCallum,et al. Automating the Construction of Internet Portals with Machine Learning , 2000, Information Retrieval.

[5] Ben Taskar,et al. Posterior Regularization for Structured Latent Variable Models , 2010, J. Mach. Learn. Res..

[6] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[7] Gideon S. Mann,et al. Generalized Expectation Criteria for Semi-Supervised Learning with Weakly Labeled Data , 2010, J. Mach. Learn. Res..

[8] C. Lee Giles,et al. ParsCit: an Open-source CRF Reference String Parsing Package , 2008, LREC.

[9] Fabrizio Sebastiani,et al. Machine learning in automated text categorization , 2001, CSUR.

[10] Andrew McCallum,et al. Alternating Projections for Learning with Expectation Constraints , 2009, UAI.

[11] Ming-Wei Chang,et al. Guiding Semi-Supervision with Constraint-Driven Learning , 2007, ACL.