A Scalable Framework for Multilevel Streaming Data Analytics using Deep Learning

The rapid growth of data in velocity, volume, value, variety, and veracity has enabled exciting new opportunities and presented big challenges for businesses of all types. Recently, there has been considerable interest in developing systems for processing continuous data streams with the increasing need for real-time analytics for decision support in the business, healthcare, manufacturing, and security. The analytics of streaming data usually relies on the output of offline analytics on static or archived data. However, businesses and organizations like our industry partner Gnowit, strive to provide their customers with real time market information and continuously look for a unified analytics framework that can integrate both streaming and offline analytics in a seamless fashion to extract knowledge from large volumes of hybrid streaming data. We present our study on designing a multilevel streaming text data analytics framework by comparing leading edge scalable open-source, distributed, and in-memory technologies. We demonstrate the functionality of the framework for a use case of multilevel text analytics using deep learning for language understanding and sentiment analysis including data indexing and query processing. Our framework combines Spark streaming for real time text processing, the Long Short Term Memory (LSTM) deep learning model for higher level sentiment analysis, and other tools for SQL-based analytical processing to provide a scalable solution for multilevel streaming text analytics.

[1]  Steve Renals,et al.  Multiplicative LSTM for sequence modelling , 2016, ICLR.

[2]  Lekha R. Nair,et al.  Applying spark based machine learning model on streaming big data for health status prediction , 2017, Comput. Electr. Eng..

[3]  Reynold Xin,et al.  Apache Spark , 2016 .

[4]  Hemant Purohit,et al.  CitizenHelper: A Streaming Analytics System to Mine Citizen and Web Data for Humanitarian Organizations , 2017, ICWSM.

[5]  Yang Wang,et al.  BigDL: A Distributed Deep Learning Framework for Big Data , 2018, SoCC.

[6]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[7]  Quoc V. Le,et al.  Semi-supervised Sequence Learning , 2015, NIPS.

[8]  Jisun An,et al.  Two Tales of the World: Comparison of Widely Used World News Datasets GDELT and EventRegistry , 2016, ICWSM.

[9]  Xabier Artola,et al.  Big data for Natural Language Processing: A streaming approach , 2015, Knowl. Based Syst..

[10]  Farhana H. Zulkernine,et al.  A Scalable and Robust Framework for Data Stream Ingestion , 2018, 2018 IEEE International Conference on Big Data (Big Data).

[11]  Shuai Wang,et al.  Deep learning for sentiment analysis: A survey , 2018, WIREs Data Mining Knowl. Discov..

[12]  Andrew Psaltis Streaming Data: Understanding the real-time pipeline , 2017 .

[13]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[14]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[15]  Ilya Sutskever,et al.  Learning to Generate Reviews and Discovering Sentiment , 2017, ArXiv.

[16]  Antske Fokkens,et al.  NewsReader: Using knowledge resources in a cross-lingual reading machine to generate more knowledge from massive streams of news , 2016, Knowl. Based Syst..

[17]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[18]  Sean Owen,et al.  Advanced Analytics with Spark: Patterns for Learning from Data at Scale , 2015 .

[19]  Alessandro Moschitti,et al.  Twitter Sentiment Analysis with Deep Convolutional Neural Networks , 2015, SIGIR.

[20]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[21]  Sebastian Ruder,et al.  Universal Language Model Fine-tuning for Text Classification , 2018, ACL.

[22]  Erik Cambria,et al.  Recent Trends in Deep Learning Based Natural Language Processing , 2017, IEEE Comput. Intell. Mag..

[23]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..