Extracting Structured Information via Automatic + Human Computation

We present a system for extracting structured information from unstructured text using a combination of information retrieval, natural language processing, machine learning, and crowdsourcing. We test our pipeline by building a structured database of gun violence incidents in the United States. The results of our pilot study demonstrate that the proposed methodology is a viable way of collecting large-scale, up-todate data for public health, public policy, and social science research.

[1]  Jiaquan Xu,et al.  Deaths: Final Data for 2013. , 2016, National vital statistics reports : from the Centers for Disease Control and Prevention, National Center for Health Statistics, National Vital Statistics System.

[2]  Rosta Farzan,et al.  Measuring Impact of Local Community Initiatives: A Crowdsourcing Approach , 2014, AAAI 2014.

[3]  J P Kassirer,et al.  A partisan assault on science--the threat to the CDC. , 1995, The New England journal of medicine.

[4]  Andrew Chou,et al.  Semantic Parsing on Freebase from Question-Answer Pairs , 2013, EMNLP.

[5]  Harry Hochheiser,et al.  Testing Pre-Annotation to Help Non-Experts Identify Drug-Drug Interactions Mentioned in Drug Product Labeling , 2014, AAAI 2014.