论文信息 - Recognizing Citations in Public Comments

Recognizing Citations in Public Comments

ABSTRACT Notice and comment rulemaking is central to how U.S. federal agencies craft new regulation. E-rulemaking, the process of soliciting and considering public comments that are submitted electronically, poses a challenge for agencies. The large volume of comments received makes it difficult to distill and address the most substantive concerns of the public. This work attempts to alleviate this burden by applying existing machine learning techniques to the problem of recognizing citation sentences. A citation in this context is defined as a statement in which the author of the public comment references an external source of factual information that is associated with a specific person or organization. The problem is formulated as a binary classification problem: Is a specific person or organization mentioned in a sentence being referenced as an external source of information? We show that our definition of a citation is reproducible by human judges and that citations can be detected using machine learning techniques with some success. Casting this as a machine learning problem requires selecting an appropriate representation of the sentence. Several feature sets are evaluated individually and in combination. Superior results are obtained by combining feature sets. Syntactic features, which characterize the structure of the sentence rather than its content, significantly improve accuracy when combined with other features, but not when used in isolation. Although prediction error rate is adequate, coverage could be improved. An error analysis enumerates short-term and long-term challenges that must be overcome to improve recall.

Jaime Arguello | Jamie Callan | Stuart W. Shulman

[1] Simone Teufel,et al. Automatic classification of citation function , 2006, EMNLP.

[2] Robert Dale,et al. Evidence-Based Information Extraction for High Accuracy Citation and Author Name Identification , 2007, RIAO.

[3] Janyce Wiebe,et al. Learning Subjective Language , 2004, CL.

[4] Dragomir R. Radev,et al. Semi-Supervised Classification for Extracting Protein Interaction Sentences using Dependency Parsing , 2007, EMNLP.

[5] David H. Wolpert,et al. Stacked generalization , 1992, Neural Networks.

[6] Mark Stevenson,et al. A Semantic Approach to IE Pattern Induction , 2005, ACL.

[7] Grace Hui Yang,et al. Near-duplicate detection by instance-level constrained clustering , 2006, SIGIR.

[8] Roman Yangarber,et al. Counter-Training in Discovery of Semantic Patterns , 2003, ACL.

[9] D. Sculley,et al. Mining millions of metaphors , 2008, Lit. Linguistic Comput..

[10] Thorsten Joachims,et al. Making large-scale support vector machine learning practical , 1999 .

[11] J. Ziman,et al. Public knowledge. An essay concerning the social dimension of science , 1970, Medical History.