STREAMING TWITTER DATA ANALYSIS USING SPARK FOR EFFECTIVE JOB SEARCH

Near real time Big Data from social network sites like Twitter or Facebook has been an interesting source for analytics by researchers in recent years owing to various factors including its up-to-date-ness, availability and popularity, though there may be a compromise in genuineness or accuracy. Apache Spark, the trendy big data processing engine that offers faster solutions compared to Hadoop, can be effectively utilized in finding patterns of relevance useful for the common man from these sites. Recently many organizations are advertising their job vacancies through tweets, which saves time and cost in recruitment. This paper addresses the issue of real time analyzing and filtering those numerous job advertisements from among the millions of other streaming tweets and classify them into various job categories to facilitate effective job search, utilizing Spark.

[1]  Alok N. Choudhary,et al.  Twitter Trending Topic Classification , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[2]  Wenjie Li,et al.  Sequential Summarization: A Full View of Twitter Trending Topics , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[3]  Laurence T. Yang,et al.  Big Data Real-Time Processing Based on Storm , 2013, 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications.

[4]  Yutaka Matsuo,et al.  Tweet Analysis for Real-Time Event Detection and Earthquake Reporting System Development , 2013, IEEE Transactions on Knowledge and Data Engineering.

[5]  Kevin J. Nowka,et al.  Second-Generation Big Data Systems , 2015, Computer.

[6]  Jong Kim,et al.  Spam Filtering in Twitter Using Sender-Receiver Relationship , 2011, RAID.

[7]  Hakan Ferhatosmanoglu,et al.  Short text classification in twitter to improve information filtering , 2010, SIGIR.

[8]  Xueqi Cheng,et al.  TASC:Topic-Adaptive Sentiment Classification on Dynamic Tweets , 2015, IEEE Transactions on Knowledge and Data Engineering.

[9]  Xiaolin Du,et al.  Short Text Classification: A Survey , 2014, J. Multim..

[10]  Gregory J. Park,et al.  Predicting Dark Triad Personality Traits from Twitter Usage and a Linguistic Analysis of Tweets , 2012, 2012 11th International Conference on Machine Learning and Applications.

[11]  Mark Batey,et al.  A tale of two sites: Twitter vs. Facebook and the personality predictors of social media usage , 2012, Comput. Hum. Behav..

[12]  Dawn Xiaodong Song,et al.  Suspended accounts in retrospect: an analysis of twitter spam , 2011, IMC '11.

[13]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.