Gathering Public Concerns from Web Towards Building Corpus of Japanese Regional Concerns

Importance of concern assessment has been increased in Japanese regional communities. We have developed an e-Participation web platform based on a Linked Open Data set called SOCIA (Social Opinions and Concerns for Ideal Argumentation). To sophisticate text mining technologies for supporting concern assessment, building a corpus of public concerns is an urgent task. There are two issues to utilize the dataset SOCIA as a corpus: (1) it is required to manage reliability of annotation and (2) to filter out noisy text not relevant to public concerns. To address these research issues, (1) we incorporate schema for describing meta-context information of annotation, that is, who is annotator, whether the annotator is a human or a software agent, and how reliable the annotation is. Furthermore, (2) we investigate the difference between features of concerns and that of non-concerns in Japanese microblog posts (i.e., tweets). Through the investigation, we address sample selection bias by formulating a novel metric for ranking features, i.e., bias-penalized information gain (BPIG).

[1]  Noriaki Izumi,et al.  Evaluation of participants' contributions in knowledge creation based on semantic authoring , 2007 .

[2]  Simon Buckingham Shum,et al.  Cohere: A Prototype for Contested Collective Intelligence , 2010 .

[3]  Mark Klein,et al.  Enabling On-Line Deliberation and Collective Decision-Making through Large-Scale Argumentation: A New Approach to the Design of an Internet-Based Mass Collaboration Platform , 2009, Int. J. Decis. Support Syst. Technol..

[4]  Peter Reichstädter,et al.  Linked Open Data - A Means for Public Sector Information Management , 2011, EGOVIS.

[5]  Henry Prakken,et al.  A critical review of argument visualization tools: Do users become better reasoners? , 2006 .

[6]  Tom Heath,et al.  Linked Data: Evolving the Web into a Global Data Space , 2011, Linked Data.

[7]  David R. Karger,et al.  Tackling the Poor Assumptions of Naive Bayes Text Classifiers , 2003, ICML.

[8]  Simon Buckingham Shum,et al.  Hypermedia as a productivity tool for doctoral research , 2005, New Rev. Hypermedia Multim..

[9]  B. Obama Memorandum for the Heads of Executive Departments and Agencies: Open Data Policy--Managing Information as an Asset , 2013 .

[10]  Ramon Prudencio S. Toledo Visualizing Argumentation: Software Tools for Collaborative and Educational Sense-Making , 2005, Inf. Vis..

[11]  Toramatsu Shintani,et al.  A Web Agent Based on Exploratory Event Mining in Social Media , 2012, 2012 IIAI International Conference on Advanced Applied Informatics.

[12]  Toramatsu Shintani,et al.  An e-Participation support system for regional communities based on linked open data, classification and clustering , 2012, 2012 IEEE 11th International Conference on Cognitive Informatics and Cognitive Computing.