Hadoop-based Framework for Information Extraction from Social Text

Social data analysis becomes a real business requirement regarding the frequent use of social media as a new business strategy. However, their volume, velocity and variety are challenging their storage and processing. In a previous contribution (Jenhani et al., 2016a, 2016b), we proposed an events extraction system in which we focused only on data variety and we did not handle volume and velocity dimensions. So, our solution cannot be considered a big data system. In this work, we port previously proposed system to a parallel and distributed framework in order to reduce the complexity of task and scale up to larger volumes of data continuously growing. We propose two loosely coupled Hadoop clusters for entity recognition and events extraction. In experiments, we carried time test and accuracy test to check the performance of the system on extracting drug abuse behavioral events from 1000000 tweets. Hadoop-based system achieves better performance compared to old system.

[1]  Lamjed Ben Said,et al.  Lexicon-Based System for Drug Abuse Entity Extraction from Twitter , 2015, BDAS.

[2]  S. Vasavi,et al.  Hadoop Framework For Entity Resolution Within High Velocity Streams , 2016 .

[3]  Trevor Cohn,et al.  Trendminer: An Architecture for Real Time Analysis of Social Media Text , 2012, ICWSM 2012.

[4]  Sanjay Kumar Jena,et al.  Sarcastic sentiment detection in tweets streamed in real time: a big data approach , 2016, Digit. Commun. Networks.

[5]  Pierre Nugues,et al.  KOSHIK- A Large-scale Distributed Computing Framework for NLP , 2014, ICPRAM.

[6]  Divya,et al.  Big Data Sentiment Analysis using Hadoop , 2015 .

[7]  Byoungchul Ahn,et al.  MapReduce Functions to Analyze Sentiment Information from Social Big Data , 2015, Int. J. Distributed Sens. Networks.

[8]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[9]  S. Dhamodaran,et al.  Big Data Implementation of Natural Disaster Monitoring and Alerting System in Real Time Social Network using Hadoop Technology , 2015 .

[10]  Lamjed Ben Said,et al.  A Hybrid Approach for Drug Abuse Events Extraction from Twitter , 2016, KES.

[11]  Paolo Nesi,et al.  Ge(o)Lo(cator): Geographic Information Extraction from Unstructured Text Data and Web Documents , 2014, 2014 9th International Workshop on Semantic and Social Media Adaptation and Personalization.

[12]  Ranjan Kumar Behera,et al.  Real-Time Sentiment Analysis of Twitter Streaming data for Stock Prediction , 2018 .