Iktishaf+: A Big Data Tool with Automatic Labeling for Road Traffic Social Sensing and Event Detection Using Distributed Machine Learning

Digital societies could be characterized by their increasing desire to express themselves and interact with others. This is being realized through digital platforms such as social media that have increasingly become convenient and inexpensive sensors compared to physical sensors in many sectors of smart societies. One such major sector is road transportation, which is the backbone of modern economies and costs globally 1.25 million deaths and 50 million human injuries annually. The cutting-edge on big data-enabled social media analytics for transportation-related studies is limited. This paper brings a range of technologies together to detect road traffic-related events using big data and distributed machine learning. The most specific contribution of this research is an automatic labelling method for machine learning-based traffic-related event detection from Twitter data in the Arabic language. The proposed method has been implemented in a software tool called Iktishaf+ (an Arabic word meaning discovery) that is able to detect traffic events automatically from tweets in the Arabic language using distributed machine learning over Apache Spark. The tool is built using nine components and a range of technologies including Apache Spark, Parquet, and MongoDB. Iktishaf+ uses a light stemmer for the Arabic language developed by us. We also use in this work a location extractor developed by us that allows us to extract and visualize spatio-temporal information about the detected events. The specific data used in this work comprises 33.5 million tweets collected from Saudi Arabia using the Twitter API. Using support vector machines, naïve Bayes, and logistic regression-based classifiers, we are able to detect and validate several real events in Saudi Arabia without prior knowledge, including a fire in Jeddah, rains in Makkah, and an accident in Riyadh. The findings show the effectiveness of Twitter media in detecting important events with no prior knowledge about them.

[1]  Sunu Wibirama,et al.  Real-time traffic classification with Twitter data mining , 2016, 2016 8th International Conference on Information Technology and Electrical Engineering (ICITEE).

[2]  Joel J. P. C. Rodrigues,et al.  Hybrid Deep-Learning-Based Anomaly Detection Scheme for Suspicious Flow Detection in SDN: A Social Multimedia Perspective , 2019, IEEE Transactions on Multimedia.

[3]  Rashid Mehmood,et al.  Big Data for Smart Infrastructure Design: Opportunities and Challenges , 2019, Smart Infrastructure and Applications.

[4]  Ahmad Baraani-Dastjerdi,et al.  Semi-Automatic Labeling of Training Data Sets in Text Classification , 2011, Comput. Inf. Sci..

[5]  Kim-Kwang Raymond Choo,et al.  Security and Privacy for the Internet of Drones: Challenges and Solutions , 2018, IEEE Communications Magazine.

[6]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[7]  Rashid Mehmood,et al.  Towards a Semantically Enriched Computational Intelligence (SECI) Framework for Smart Farming , 2017 .

[8]  Rashid Mehmood,et al.  Enabling Next Generation Logistics and Planning for Smarter Societies , 2017, ANT/SEIT.

[9]  Rashid Mehmood,et al.  Analysis of Tweets in Arabic Language for Detection of Road Traffic Conditions , 2017 .

[10]  Liuqing Yang,et al.  Big Data for Social Transportation , 2016, IEEE Transactions on Intelligent Transportation Systems.

[11]  Rashid Mehmood,et al.  Big data logistics: a health-care transport capacity sharing model , 2015 .

[12]  Iyad A. Katib,et al.  Road Traffic Vehicle Detection and Tracking using Deep Learning with Custom-Collected and Public Datasets , 2020 .

[13]  Rashid Mehmood,et al.  Parallel Iterative Solution of Large Sparse Linear Equation Systems on the Intel MIC Architecture , 2019, Smart Infrastructure and Applications.

[14]  Rashid Mehmood,et al.  HCDSR: A Hierarchical Clustered Fault Tolerant Routing Technique for IoT-Based Smart Societies , 2019, Smart Infrastructure and Applications.

[15]  Rashid Mehmood,et al.  UTiLearn: A Personalised Ubiquitous Teaching and Learning System for Smart Societies , 2017, IEEE Access.

[16]  Rashid Mehmood,et al.  LocPriS: A Security and Privacy Preserving Location Based Services Development Framework , 2010, KES.

[17]  Mohammad Al-Smadi,et al.  Knowledge-based Approach for Event Extraction from Arabic Tweets , 2016 .

[18]  Rashid Mehmood,et al.  Big Data and HPC Convergence for Smart Infrastructures: A Review and Proposed Architecture , 2019, Smart Infrastructure and Applications.

[19]  Zaher Al Aghbari,et al.  SNSJam: Road traffic analysis and prediction by fusing data from multiple social networks , 2020, Inf. Process. Manag..

[20]  Rashid Mehmood,et al.  Increasing Sustainability of Road Transport in European Cities and Metropolitan Areas by Facilitating Autonomic Road Transport Systems (ARTS) , 2014 .

[21]  Rashid Mehmood,et al.  Automatic Event Detection in Smart Cities Using Big Data Analytics , 2017 .

[22]  Rashid Mehmood,et al.  Big Data Tools, Technologies, and Applications: A Survey , 2020 .

[23]  Rashid Mehmood,et al.  ZAKI: A Smart Method and Tool for Automatic Performance Optimization of Parallel SpMV Computations on Distributed Memory Machines , 2019, Mobile Networks and Applications.

[24]  Teodora Sandra Buda,et al.  RCMC: Recognizing Crowd-Mobility Patterns in Cities Based on Location Based Social Networks Data , 2017, ACM Trans. Intell. Syst. Technol..

[25]  Raymond Y. K. Lau Toward a social sensor based framework for intelligent transportation , 2017, 2017 IEEE 18th International Symposium on A World of Wireless, Mobile and Multimedia Networks (WoWMoM).

[26]  Jong Hyuk Park,et al.  ALCA: agent learning–based clustering algorithm in vehicular ad hoc networks , 2012, Personal and Ubiquitous Computing.

[27]  C. Nwagboso,et al.  Traffic event detection framework using social media , 2017, 2017 IEEE International Conference on Smart Grid and Smart Cities (ICSGSC).

[28]  Nor Badrul Anuar,et al.  The role of big data in smart city , 2016, Int. J. Inf. Manag..

[29]  Rashid Mehmood,et al.  SURAA: A Novel Method and Tool for Loadbalanced and Coalesced SpMV Computations on GPUs , 2019 .

[30]  Alexander Zipf,et al.  An exploration of the interaction between urban human activities and daily traffic conditions: A case study of Toronto, Canada , 2019, Cities.

[31]  Ashantha Goonetilleke,et al.  How engaging are disaster management related social media channels? The case of Australian state emergency organisations , 2020 .

[32]  Durga Toshniwal,et al.  Face off: Travel Habits, Road Conditions and Traffic City Characteristics Bared Using Twitter , 2019, IEEE Access.

[33]  Ashantha Goonetilleke,et al.  Can volunteer crowdsourcing reduce disaster risk? A systematic review of the literature , 2019, International Journal of Disaster Risk Reduction.

[34]  Rashid Mehmood,et al.  Sehaa: A Big Data Analytics Tool for Healthcare Symptoms and Diseases Detection Using Twitter, Apache Spark, and Machine Learning , 2020, Applied Sciences.

[35]  Medha A. Shah,et al.  Real time road traffic event detection using Twitter and spark , 2017, 2017 International Conference on Inventive Communication and Computational Technologies (ICICCT).

[36]  Rashid Mehmood,et al.  Big Data Enabled Healthcare Supply Chain Management: Opportunities and Challenges , 2017 .

[37]  Rashid Mehmood,et al.  Automatic Detection and Validation of Smart City Events Using HPC and Apache Spark Platforms , 2019, Smart Infrastructure and Applications.

[38]  Rashid Mehmood,et al.  Rapid Transit Systems: Smarter Urban Planning Using Big Data, In-Memory Computing, Deep Learning, and GPUs , 2019, Sustainability.

[39]  Rashid Mehmood,et al.  Data Fusion and IoT for Smart Ubiquitous Environments: A Survey , 2017, IEEE Access.

[40]  Rashid Mehmood,et al.  Sentiment Analysis of Arabic Tweets for Road Traffic Congestion and Event Detection , 2019, Smart Infrastructure and Applications.

[41]  Rashid Mehmood,et al.  UbeHealth: A Personalized Ubiquitous Cloud and Edge-Enabled Networked Healthcare System for Smart Cities , 2018, IEEE Access.

[42]  Rashid Mehmood,et al.  Smarter Traffic Prediction Using Big Data, In-Memory Computing, Deep Learning and GPUs , 2019, Sensors.

[43]  Rashid Mehmood,et al.  ZAKI+: A Machine Learning Based Process Mapping Tool for SpMV Computations on Distributed Memory Architectures , 2019, IEEE Access.

[44]  Albert Y. Zomaya,et al.  A Hybrid Deep Learning-Based Model for Anomaly Detection in Cloud Datacenter Networks , 2019, IEEE Transactions on Network and Service Management.

[45]  Takashi Gojobori,et al.  A Survey of Methods and Tools for Large-Scale DNA Mixture Profiling , 2019, Smart Infrastructure and Applications.

[46]  Tan Yigitcanlar,et al.  How Are Smart City Concepts and Technologies Perceived and Utilized? A Systematic Geo-Twitter Analysis of Smart Cities in Australia , 2020, Journal of Urban Technology.

[47]  Ashish Sureka,et al.  Potholes and bad road conditions: mining Twitter to extract information on killer roads , 2018, COMAD/CODS.

[48]  Rashid Mehmood,et al.  Location Privacy in Smart Cities Era , 2017 .

[49]  Choochart Haruechaiyasak,et al.  Traffic information extraction and classification from Thai Twitter , 2016, 2016 13th International Joint Conference on Computer Science and Software Engineering (JCSSE).

[50]  Rashid Mehmood,et al.  Parallel Shortest Path Big Data Graph Computations of US Road Network Using Apache Spark: Survey, Architecture, and Evaluation , 2019, Smart Infrastructure and Applications.

[51]  Giuseppe Ioppolo,et al.  How can social media analytics assist authorities in pandemic-related policy decisions? Insights from Australian states and territories , 2020, Health Inf. Sci. Syst..

[52]  J. Maillard,et al.  Use of the ‘ex vivo’ test to study long‐term bacterial survival on human skin and their sensitivity to antisepsis , 2004, Journal of applied microbiology.

[53]  Rashid Mehmood,et al.  Distributed Artificial Intelligence-as-a-Service (DAIaaS) for Smarter IoE and 6G Environments , 2020, Sensors.

[54]  Raúl Aquino-Santos,et al.  Traffic Congestion Detection System through Connected Vehicles and Big Data , 2016, Sensors.

[55]  Rashid Mehmood,et al.  Iktishaf: a Big Data Road-Traffic Event Detection Tool Using Twitter and Spark Machine Learning , 2020 .

[56]  Rashid Mehmood,et al.  TAAWUN: a Decision Fusion and Feature Specific Road Detection Approach for Connected Autonomous Vehicles , 2019, Mobile Networks and Applications.

[57]  Rashid Mehmood,et al.  Hybrid Statistical and Machine Learning Methods for Road Traffic Prediction: A Review and Tutorial , 2019, Smart Infrastructure and Applications.

[58]  Y. Matsuo,et al.  Real-time event extraction for driving information from social sensors , 2012, 2012 IEEE International Conference on Cyber Technology in Automation, Control, and Intelligent Systems (CYBER).

[59]  Neeraj Kumar,et al.  Deep learning models for traffic flow prediction in autonomous vehicles: A review, solutions, and challenges , 2019, Veh. Commun..

[60]  Rashid Mehmood,et al.  iResponse: An AI and IoT-Enabled Framework for Autonomous COVID-19 Pandemic Management , 2021, Sustainability.

[61]  Wael Khreich,et al.  A Survey of Techniques for Event Detection in Twitter , 2015, Comput. Intell..

[62]  Rashid Mehmood,et al.  Road Traffic Event Detection Using Twitter Data, Machine Learning, and Apache Spark , 2019, 2019 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI).

[63]  K. Desouza,et al.  Artificial Intelligence Technologies and Related Urban Planning and Development Concepts: How Are They Perceived and Utilized in Australia? , 2020, Journal of Open Innovation: Technology, Market, and Complexity.

[64]  Rashid Mehmood,et al.  An Ensemble Machine and Deep Learning Model for Risk Prediction in Aviation Systems , 2020, 2020 6th Conference on Data Science and Machine Learning Applications (CDMA).

[65]  Faisal Saeed,et al.  A Framework for Preserving Location Privacy for Continuous Queries , 2019, IRICT.

[66]  Jianqiang Li,et al.  Semi-supervised learning in large scale text categorization , 2017 .

[67]  Pete Burnap,et al.  Arabic Event Detection in Social Media , 2015, CICLing.

[68]  K. Desouza,et al.  Responsible Urban Innovation with Local Government Artificial Intelligence (AI): A Conceptual Framework and Research Agenda , 2021, Journal of Open Innovation: Technology, Market, and Complexity.

[69]  Susan E. Boom,et al.  Star , 1995, The SAGE Encyclopedia of Trans Studies.

[70]  Francisco Herrera,et al.  Self-labeled techniques for semi-supervised learning: taxonomy, software and empirical study , 2015, Knowledge and Information Systems.

[71]  S. Natarajan,et al.  How social media can contribute during disaster events? Case study of Chennai floods 2015 , 2016, 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[72]  Iyad A. Katib,et al.  COVID-19: Detecting Government Pandemic Measures and Public Concerns from Twitter Arabic Data Using Distributed Machine Learning , 2021, International journal of environmental research and public health.

[73]  Eleonora D'Andrea,et al.  Real-Time Detection of Traffic From Twitter Stream Analysis , 2015, IEEE Transactions on Intelligent Transportation Systems.

[74]  Lei Zhang,et al.  Sentiment Analysis and Opinion Mining , 2017, Encyclopedia of Machine Learning and Data Mining.

[75]  Khaled Shaalan,et al.  An Arabic social media based framework for incidents and events monitoring in smart cities , 2019, Journal of Cleaner Production.

[76]  Dino Isa,et al.  Using unsupervised clustering approach to train the Support Vector Machine for text classification , 2016, Neurocomputing.

[77]  Ashantha Goonetilleke,et al.  Determining disaster severity through social media analysis: Testing the methodology with South East Queensland Flood tweets , 2020 .

[78]  Iyad A. Katib,et al.  Smart Societies, Infrastructure, Technologies and Applications , 2017, Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering.

[79]  Rohan Kumar,et al.  Tweeting Traffic: Analyzing Twitter for generating real-time city traffic insights and predictions , 2015, CODS Companion Volume.

[80]  Gregory Epiphaniou,et al.  Classification of colloquial Arabic tweets in real-time to detect high-risk floods , 2017, 2017 International Conference On Social Media, Wearable And Web Analytics (Social Media).