Blog text analysis using topic modeling, named entity recognition and sentiment classifier combine

This paper describes our experimental work on computational analysis of socio-political blog data through a novel combine of sophisticated language processing and visualization techniques. We have designed an integrated framework by utilizing Topic Modeling, Entity Extraction and Sentiment Analysis; to draw sociologically relevant inferences from unstructured free form blogosphere data. The dataset comprised of more than 9290 blog posts on social-political events related to the Arab spring. We have tried to extract important inferences from the dataset; such as key themes, persons, places, organizations and overall sentiment orientation of the content around different entities in the texts. We have tried to validate the inferences obtained through manual and Google search trends statistics. The results obtained are quite relevant and demonstrate the usefulness of our approach for computational analysis of social media data.

[1]  Satoshi Sekine,et al.  A survey of named entity recognition and classification , 2007 .

[2]  Lada A. Adamic,et al.  The political blogosphere and the 2004 U.S. election: divided they blog , 2005, LinkKDD '05.

[3]  Philip S. Yu,et al.  Identifying the influential bloggers in a community , 2008, WSDM '08.

[4]  Yoshihiko Suhara,et al.  Event mining from the Blogosphere using topic words , 2007, ICWSM.

[5]  Debanjan Mahata,et al.  Mining the Blogosphere from a socio-political perspective , 2010, 2010 International Conference on Computer Information Systems and Industrial Management Applications (CISIM).

[6]  Christian Callegari,et al.  Advances in Computing, Communications and Informatics (ICACCI) , 2015 .

[7]  P. Waila,et al.  Sentiment analysis of movie reviews: A new feature-based heuristic for aspect-level sentiment classification , 2013, 2013 International Mutli-Conference on Automation, Computing, Communication, Control and Compressed Sensing (iMac4s).

[8]  Mehran Sahami,et al.  Text Mining: Classification, Clustering, and Applications , 2009 .

[9]  Vivek Singh,et al.  Opinion Mining from Weblogs and Its Relevance for Socio-political Research , 2012 .

[10]  Huan Liu,et al.  Social computing in blogosphere , 2009 .

[11]  Vivek Kumar Singh Mining the Blogosphere for Sociological Inferences , 2010, IC3.

[12]  Nitin Agarwal,et al.  What does everybody know? Identifying event-specific sources from social media , 2012, 2012 Fourth International Conference on Computational Aspects of Social Networks (CASoN).

[13]  Denilson Barbosa,et al.  Extracting information networks from the blogosphere , 2012, TWEB.

[14]  Hallvard Moe Mapping the Norwegian Blogosphere: Methodological Challenges in Internationalizing Internet Research , 2011 .

[15]  P. Waila,et al.  Sentiment analysis of Movie reviews and Blog posts , 2013, 2013 3rd IEEE International Advance Computing Conference (IACC).

[16]  Philip S. Yu,et al.  Guest Editors' Introduction: Social Computing in the Blogosphere , 2010, IEEE Internet Comput..

[17]  C. Elkan,et al.  Topic Models , 2008 .

[18]  Huan Liu,et al.  Blogosphere: research issues, tools, and applications , 2008, SKDD.

[19]  Lin Jia,et al.  Mapping the Blogosphere in America , 2004, WWW 2004.

[20]  David M. Blei,et al.  Probabilistic topic models , 2012, Commun. ACM.

[21]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..