Introducing textual analysis tools for policy informatics: a case study of e-petitions

Electronic petitioning (e-petitioning) provides a unique and promising channel through which people can directly express their policy preferences. E-petitions may be viewed as a natural laboratory for determining subjects of public interest, and thus can be used by policy analysts to understand social needs and constraints. In this paper, we introduce textual analysis tools (such as NER and topic modeling) and extract three types of novel variables (informativeness, named entities, and 21 topics) from We the People petition texts. The regression result shows that informativeness, named location, and several topics are significantly correlated with the log of the signature counts. These exploratory but promising results indicate that textual analysis tools can complement traditional statistical methods by providing descriptive measures that are helpful for making causal inferences from electronic petition data. These new tools, we believe, will facilitate policy analysis and policy informatics by enabling meaningful use of large volumes of online archives containing public expression regarding policy preferences.

[1]  Joost Berkhout,et al.  The Politics of Attention: How Government Prioritizes Problems , 2008 .

[2]  Satoshi Sekine,et al.  A survey of named entity recognition and classification , 2007 .

[3]  Oren Etzioni,et al.  Named Entity Recognition in Tweets: An Experimental Study , 2011, EMNLP.

[4]  Catherine L. Dumas,et al.  Examining political mobilization of online communities through e-petitioning behavior in We the People , 2015, Big Data Soc..

[5]  Andrew J. Flanagin,et al.  Technological change and the shifting nature of political organization , 2008 .

[6]  Bo Pang,et al.  The effect of wording on message propagation: Topic- and author-controlled natural experiments on Twitter , 2014, ACL.

[7]  Klaus Petrik,et al.  Participation and e-democracy how to utilize web 2.0 for policy decision-making , 2009, D.GO.

[8]  Dan Roth,et al.  Design Challenges and Misconceptions in Named Entity Recognition , 2009, CoNLL.

[9]  Dipak K. Gupta,et al.  Analyzing Public Policy: Concepts, Tools, and Techniques , 2001 .

[10]  Miles Osborne,et al.  RT to Win! Predicting Message Propagation in Twitter , 2011, ICWSM.

[11]  Teresa M. Harrison,et al.  Transparency, participation, and accountability practices in open government: A comparative study , 2014, Gov. Inf. Q..

[12]  Lee Rainie A Biography of the Pew Research Center’s Internet & American Life Project , 2012 .

[13]  Lucian L. Visinescu,et al.  Text-mining the voice of the people , 2012, Commun. ACM.

[14]  Aaron Smith,et al.  72% of online adults are social networking site users , 2013 .

[15]  Teresa M. Harrison,et al.  We the People: U.S. E-Petitioning as Technology-Mediated Social Action , 2014 .

[16]  D. Lazer,et al.  The Parable of Google Flu: Traps in Big Data Analysis , 2014, Science.

[17]  Gondy Leroy,et al.  Natural language processing and e-Government: crime information extraction from heterogeneous data sources , 2008, DG.O.

[18]  Lawrence C. Walters,et al.  Putting More Public in Policy Analysis , 2000 .

[19]  Loni Hagen,et al.  Understanding Citizens' Direct Policy Suggestions to the Federal Government: A Natural Language Processing and Topic Modeling Approach , 2015, 2015 48th Hawaii International Conference on System Sciences.

[20]  Tong Zhang,et al.  A Robust Risk Minimization based Named Entity Recognition System , 2003, CoNLL.

[21]  Namhee Kwon,et al.  Multidimensional text analysis for eRulemaking , 2006, DG.O.

[22]  Hsinchun Chen,et al.  Extracting Meaningful Entities from Police Narrative Reports , 2002, DG.O.

[23]  B. Frey,et al.  Social Comparisons and Pro-social Behavior: Testing "Conditional Cooperation" in a Field Experiment , 2004 .

[24]  Mary J. Newhart,et al.  Not by technology alone: The "analog" aspects of online public engagement in policymaking , 2014, Gov. Inf. Q..

[25]  Namhee Kwon,et al.  Tools for Rules: Technology Transfer and Electronic Rulemaking , 2008, Proceedings of the 41st Annual Hawaii International Conference on System Sciences (HICSS 2008).

[26]  W. Dutton,et al.  Next Generation Users: The Internet in Britain , 2011 .

[27]  Jen Shang,et al.  A Field Experiment in Charitable Contribution: The Impact of Social Information on the Voluntary Provision of Public Goods , 2009 .

[28]  Ralph Grishman,et al.  Message Understanding Conference- 6: A Brief History , 1996, COLING.

[29]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[30]  Hamish Cunningham GATE, a General Architecture for Text Engineering , 2002 .

[31]  Justin Grimmer,et al.  We Are All Social Scientists Now: How Big Data, Machine Learning, and Causal Inference Work Together , 2014, PS: Political Science & Politics.

[32]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[33]  Burr Settles,et al.  Biomedical Named Entity Recognition using Conditional Random Fields and Rich Feature Sets , 2004, NLPBA/BioNLP.

[34]  Claire Cardie,et al.  A study in rule-specific issue categorization for e-rulemaking , 2008, DG.O.

[35]  Scott A. Hale,et al.  Petition growth and success rates on the UK No. 10 Downing Street website , 2013, WebSci.

[36]  David M. Blei,et al.  Probabilistic topic models , 2012, Commun. ACM.

[37]  Dragomir R. Radev,et al.  How to Analyze Political Attention with Minimal Assumptions and Costs , 2010 .

[38]  Gary King,et al.  General purpose computer-assisted clustering and conceptualization , 2011, Proceedings of the National Academy of Sciences.

[39]  James G. McGann The Think Tank Index , 2009 .

[40]  Jing Jiang,et al.  Information Extraction from Text , 2012, Mining Text Data.

[41]  Francis R. Bach,et al.  Online Learning for Latent Dirichlet Allocation , 2010, NIPS.

[42]  Grace Hui Yang,et al.  Near-duplicate detection for eRulemaking , 2005, DG.O.

[43]  W. Bruce Croft,et al.  Search Engines - Information Retrieval in Practice , 2009 .

[44]  Gondy Leroy,et al.  A decision support system: Automated crime report analysis and classification for e-government , 2014, Gov. Inf. Q..

[45]  Claire Cardie,et al.  Active learning for e-rulemaking: public comment categorization , 2008, DG.O.

[46]  Hamish Cunningham,et al.  GATE-a General Architecture for Text Engineering , 1996, COLING.

[47]  James H. Martin,et al.  Speech and Language Processing, 2nd Edition , 2008 .

[48]  Margaret E. Roberts,et al.  The structural topic model and applied social science , 2013, ICONIP 2013.