论文信息 - Identifying civilians killed by police with distantly supervised entity-event extraction - 字舞流文

Identifying civilians killed by police with distantly supervised entity-event extraction

We propose a new, socially-impactful task for natural language processing: from a news corpus, extract names of persons who have been killed by police. We present a newly collected police fatality corpus, which we release publicly, and present a model to solve this problem that uses EM-based distant supervision with logistic regression and convolutional neural network classifiers. Our model outperforms two off-the-shelf event extractor systems, and it can suggest candidate victim names in some cases faster than one of the major manually-collected police fatality databases.

Katherine A. Keith | Brendan T. O'Connor | Abram Handler | Michael Pinkham | Cara Magliozzi | J. McDuffie

[1] Kilian Q. Weinberger,et al. Feature hashing for large scale multitask learning , 2009, ICML '09.

[2] Mark A. Przybocki,et al. The Automatic Content Extraction (ACE) Program – Tasks, Data, and Evaluation , 2004, LREC.

[3] Ye Zhang,et al. A Sensitivity Analysis of (and Practitioners’ Guide to) Convolutional Neural Networks for Sentence Classification , 2015, IJCNLP.

[4] Brendan T. O'Connor,et al. CMU: Arc-Factored, Discriminative Semantic Dependency Parsing , 2014, SemEval@COLING.

[5] Heeyoung Lee,et al. Joint Entity and Event Coreference Resolution across Documents , 2012, EMNLP.

[6] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[7] Thomas G. Dietterich,et al. Solving the Multiple Instance Problem with Axis-Parallel Rectangles , 1997, Artif. Intell..

[8] Yoon Kim,et al. Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[9] Dan Klein,et al. Online EM for Unsupervised Models , 2009, NAACL.

[10] Brendan T. O'Connor,et al. Learning to Extract Events from Knowledge Base Revisions , 2017, WWW.

[11] K. Lum,et al. Estimating Undocumented Homicides with Two Lists and List Dependence , 2015 .

[12] Oren Etzioni,et al. Modeling Missing Data in Distant Supervision for Information Extraction , 2013, TACL.

[13] Ralph Weischedel,et al. Automatic Extraction of Events from Open Source Text for Predictive Forecasting , 2013 .

[14] Marie Mikulová,et al. Announcing Prague Czech-English Dependency Treebank 2.0 , 2012, LREC.

[15] Zhiyuan Liu,et al. Neural Relation Extraction with Selective Attention over Instances , 2016, ACL.

[16] Philip A. Schrodt,et al. Validity Assessment of a Machine-Coded Event Data Set for the Middle East, 1982-92 , 1994 .

[17] Chris Callison-Burch,et al. The Gun Violence Database: A new task and data set for NLP , 2016, EMNLP.

[18] Michael G. Planty,et al. Arrest-Related Deaths Program Redesign Study, 2015-16: Preliminary Findings , 2016 .

[19] Noah A. Smith,et al. Softmax-Margin CRFs: Training Log-Linear Models with Cost Functions , 2010, NAACL.

[20] Lisa,et al. background in , 2017 .

[21] Dan Klein,et al. An Empirical Investigation of Statistical Significance in NLP , 2012, EMNLP.

[22] J. MacKinnon. Bootstrap Hypothesis Testing , 2007 .

[23] Judea Pearl,et al. Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[24] Ralph Grishman,et al. Event Detection and Domain Adaptation with Convolutional Neural Networks , 2015, ACL.

[25] Jun Zhao,et al. Relation Classification via Convolutional Deep Neural Network , 2014, COLING.

[26] Mihai Surdeanu,et al. Event Extraction Using Distant Supervision , 2014, LREC.

[27] Claire Cardie,et al. A Hierarchical Distance-dependent Bayesian Model for Event Coreference Resolution , 2015, TACL.

[28] Tom M. Mitchell,et al. Learning to Extract Symbolic Knowledge from the World Wide Web , 1998, AAAI/IAAI.

[29] Philip A. Schrodt. Precedents, Progress, and Prospects in Political Event Data , 2012 .

[30] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[31] Heng Ji,et al. Incremental Joint Extraction of Entity Mentions and Relations , 2014, ACL.

[32] Razvan C. Bunescu,et al. Learning to Extract Relations from the Web using Minimal Supervision , 2007, ACL.

[33] Terence Parsons,et al. Events in the Semantics of English: A Study in Subatomic Semantics , 1990 .

[34] Noah A. Smith,et al. Frame-Semantic Parsing , 2014, CL.

[35] Mark Craven,et al. Constructing Biological Knowledge Bases by Extracting Information from Text Sources , 1999, ISMB.

[36] Geoffrey E. Hinton,et al. A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[37] Andrew McCallum,et al. Collective Cross-Document Relation Extraction Without Labelled Data , 2010, EMNLP.

[38] Zoubin Ghahramani,et al. Optimization with EM and Expectation-Conjugate-Gradient , 2003, ICML.

[39] M. Felisa Verdejo,et al. Events are Not Simple: Identity, Non-Identity, and Quasi-Identity , 2013, EVENTS@NAACL-HLT.

[40] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[41] Stephan Oepen,et al. SemEval 2014 Task 8: Broad-Coverage Semantic Dependency Parsing , 2014, *SEMEVAL.

[42] Luke S. Zettlemoyer,et al. Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations , 2011, ACL.

[43] Daniel Jurafsky,et al. Distant supervision for relation extraction without labeled data , 2009, ACL.

[44] Christopher R. Johnson,et al. Background to Framenet , 2003 .

[45] Brendan T. O'Connor,et al. Learning to Extract International Relations from Political Context , 2013, ACL.

[46] Sean Gerrish,et al. Applications of latent variable models in modeling influence and decision making , 2013 .

[47] Philipp Koehn,et al. Abstract Meaning Representation for Sembanking , 2013, LAW@ACL.

[48] Andrew McCallum,et al. Modeling Relations and Their Mentions without Labeled Text , 2010, ECML/PKDD.

[49] Robert A. Jacobs,et al. Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[50] Ramesh Nallapati,et al. Multi-instance Multi-label Learning for Relation Extraction , 2012, EMNLP.

[51] Martha Palmer,et al. From TreeBank to PropBank , 2002, LREC.