Identifying civilians killed by police with distantly supervised entity-event extraction

We propose a new, socially-impactful task for natural language processing: from a news corpus, extract names of persons who have been killed by police. We present a newly collected police fatality corpus, which we release publicly, and present a model to solve this problem that uses EM-based distant supervision with logistic regression and convolutional neural network classifiers. Our model outperforms two off-the-shelf event extractor systems, and it can suggest candidate victim names in some cases faster than one of the major manually-collected police fatality databases.

[1]  Kilian Q. Weinberger,et al.  Feature hashing for large scale multitask learning , 2009, ICML '09.

[2]  Mark A. Przybocki,et al.  The Automatic Content Extraction (ACE) Program – Tasks, Data, and Evaluation , 2004, LREC.

[3]  Ye Zhang,et al.  A Sensitivity Analysis of (and Practitioners’ Guide to) Convolutional Neural Networks for Sentence Classification , 2015, IJCNLP.

[4]  Brendan T. O'Connor,et al.  CMU: Arc-Factored, Discriminative Semantic Dependency Parsing , 2014, SemEval@COLING.

[5]  Heeyoung Lee,et al.  Joint Entity and Event Coreference Resolution across Documents , 2012, EMNLP.

[6]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[7]  Thomas G. Dietterich,et al.  Solving the Multiple Instance Problem with Axis-Parallel Rectangles , 1997, Artif. Intell..

[8]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[9]  Dan Klein,et al.  Online EM for Unsupervised Models , 2009, NAACL.

[10]  Brendan T. O'Connor,et al.  Learning to Extract Events from Knowledge Base Revisions , 2017, WWW.

[11]  K. Lum,et al.  Estimating Undocumented Homicides with Two Lists and List Dependence , 2015 .

[12]  Oren Etzioni,et al.  Modeling Missing Data in Distant Supervision for Information Extraction , 2013, TACL.

[13]  Ralph Weischedel,et al.  Automatic Extraction of Events from Open Source Text for Predictive Forecasting , 2013 .

[14]  Marie Mikulová,et al.  Announcing Prague Czech-English Dependency Treebank 2.0 , 2012, LREC.

[15]  Zhiyuan Liu,et al.  Neural Relation Extraction with Selective Attention over Instances , 2016, ACL.

[16]  Philip A. Schrodt,et al.  Validity Assessment of a Machine-Coded Event Data Set for the Middle East, 1982-92 , 1994 .

[17]  Chris Callison-Burch,et al.  The Gun Violence Database: A new task and data set for NLP , 2016, EMNLP.

[18]  Michael G. Planty,et al.  Arrest-Related Deaths Program Redesign Study, 2015-16: Preliminary Findings , 2016 .

[19]  Noah A. Smith,et al.  Softmax-Margin CRFs: Training Log-Linear Models with Cost Functions , 2010, NAACL.

[20]  Lisa,et al.  background in , 2017 .

[21]  Dan Klein,et al.  An Empirical Investigation of Statistical Significance in NLP , 2012, EMNLP.

[22]  J. MacKinnon Bootstrap Hypothesis Testing , 2007 .

[23]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[24]  Ralph Grishman,et al.  Event Detection and Domain Adaptation with Convolutional Neural Networks , 2015, ACL.

[25]  Jun Zhao,et al.  Relation Classification via Convolutional Deep Neural Network , 2014, COLING.

[26]  Mihai Surdeanu,et al.  Event Extraction Using Distant Supervision , 2014, LREC.

[27]  Claire Cardie,et al.  A Hierarchical Distance-dependent Bayesian Model for Event Coreference Resolution , 2015, TACL.

[28]  Tom M. Mitchell,et al.  Learning to Extract Symbolic Knowledge from the World Wide Web , 1998, AAAI/IAAI.

[29]  Philip A. Schrodt Precedents, Progress, and Prospects in Political Event Data , 2012 .

[30]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[31]  Heng Ji,et al.  Incremental Joint Extraction of Entity Mentions and Relations , 2014, ACL.

[32]  Razvan C. Bunescu,et al.  Learning to Extract Relations from the Web using Minimal Supervision , 2007, ACL.

[33]  Terence Parsons,et al.  Events in the Semantics of English: A Study in Subatomic Semantics , 1990 .

[34]  Noah A. Smith,et al.  Frame-Semantic Parsing , 2014, CL.

[35]  Mark Craven,et al.  Constructing Biological Knowledge Bases by Extracting Information from Text Sources , 1999, ISMB.

[36]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[37]  Andrew McCallum,et al.  Collective Cross-Document Relation Extraction Without Labelled Data , 2010, EMNLP.

[38]  Zoubin Ghahramani,et al.  Optimization with EM and Expectation-Conjugate-Gradient , 2003, ICML.

[39]  M. Felisa Verdejo,et al.  Events are Not Simple: Identity, Non-Identity, and Quasi-Identity , 2013, EVENTS@NAACL-HLT.

[40]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[41]  Stephan Oepen,et al.  SemEval 2014 Task 8: Broad-Coverage Semantic Dependency Parsing , 2014, *SEMEVAL.

[42]  Luke S. Zettlemoyer,et al.  Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations , 2011, ACL.

[43]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[44]  Christopher R. Johnson,et al.  Background to Framenet , 2003 .

[45]  Brendan T. O'Connor,et al.  Learning to Extract International Relations from Political Context , 2013, ACL.

[46]  Sean Gerrish,et al.  Applications of latent variable models in modeling influence and decision making , 2013 .

[47]  Philipp Koehn,et al.  Abstract Meaning Representation for Sembanking , 2013, LAW@ACL.

[48]  Andrew McCallum,et al.  Modeling Relations and Their Mentions without Labeled Text , 2010, ECML/PKDD.

[49]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[50]  Ramesh Nallapati,et al.  Multi-instance Multi-label Learning for Relation Extraction , 2012, EMNLP.

[51]  Martha Palmer,et al.  From TreeBank to PropBank , 2002, LREC.