A Big Data Approach to Black Friday Sales

Retail companies recognize the need to analyze and predict their sales and customer behavior against their products and product categories Our study aims to help retail companies create personalized deals and promotions for their customers, even during the COVID-19 pandemic, through a big data framework that allows them to handle massive sales volumes with more efficient models In this paper, we used Black Friday sales data taken from a dataset on the Kaggle website, which contains nearly 550,000 observations analyzed with 10 features: Qualitative and quantitative The class label is purchases and sales (in U S dollars) Because the predictor label is continuous, regression models are suited in this case Using the Apache Spark big data framework, which uses the MLlib machine learning library, we trained two machine learning models: Linear regression and random forest These machine learning algorithms were used to predict future pricing and sales We first implemented a linear regression model and a random forest model without using the Spark framework and achieved accuracies of 68% and 74%, respectively Then, we trained these models on the Spark machine learning big data framework where we achieved an accuracy of 72% for the linear regression model and 81% for the random forest model © 2021, Tech Science Press All rights reserved

[1]  Abdelkbir ARMEL,et al.  Fraud Detection Using Apache Spark , 2019, 2019 5th International Conference on Optimization and Applications (ICOA).

[2]  Prem Prakash Jayaraman,et al.  Big Data Reduction Methods: A Survey , 2016, Data Science and Engineering.

[3]  Mostafa Bellafkih,et al.  Leveraging resource management for efficient performance of Apache Spark , 2019, Journal of Big Data.

[4]  Chi-Jie Lu,et al.  A Clustering-based Sales Forecasting Scheme Using Support Vector Regression for Computer Server☆ , 2015 .

[5]  Martin Ester,et al.  Multi-task based Sales Predictions for Online Promotions , 2019, CIKM.

[6]  Seung-won Hwang,et al.  Browsing2purchase: Online Customer Model for Sales Forecasting in an E-Commerce Site , 2016, WWW.

[7]  Phan Duy Hung,et al.  K-means Clustering Using R A Case Study of Market Segmentation , 2019, ICEBA 2019.

[8]  Baris Akgün,et al.  Streaming linear regression on spark MLlib and MOA , 2015, 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[9]  Hoger Khayrolla Omar,et al.  Big Data Analysis Using Apache Spark MLlib and Hadoop HDFS with Scala and Java , 2019, Kurdistan Journal of Applied Research.

[10]  Phan Duy Hung,et al.  Breast Cancer Prediction Using Spark MLlib and ML Packages , 2018, ICBRA.

[11]  Ameet Talwalkar,et al.  MLlib: Machine Learning in Apache Spark , 2015, J. Mach. Learn. Res..

[12]  Katsutoshi Yada,et al.  Prediction of Consumer Purchasing in a Grocery Store Using Machine Learning Techniques , 2016, 2016 3rd Asia-Pacific World Congress on Computer Science and Engineering (APWC on CSE).

[13]  Yangwoo Kim,et al.  A Two-Stage Big Data Analytics Framework with Real World Applications Using Spark Machine Learning and Long Short-Term Memory Network , 2018, Symmetry.

[14]  Andrea Esuli,et al.  How Data Mining and Machine Learning Evolved from Relational Data Base to Data Science , 2018, A Comprehensive Guide Through the Italian Database Research.

[15]  Veda C. Storey,et al.  Business Intelligence and Analytics: From Big Data to Big Impact , 2012, MIS Q..

[16]  Joshua Zhexue Huang,et al.  Big data analytics on Apache Spark , 2016, International Journal of Data Science and Analytics.

[17]  Abdullah Gani,et al.  A survey on indexing techniques for big data: taxonomy and performance evaluation , 2016, Knowledge and Information Systems.

[18]  Zhibin Huang,et al.  Research on the Forecast of Shared Bicycle Rental Demand Based on Spark Machine Learning Framework , 2017, 2017 16th International Symposium on Distributed Computing and Applications to Business, Engineering and Science (DCABES).

[19]  Ching-Seh Mike Wu,et al.  Comparison of Different Machine Learning Algorithms for Multiple Regression on Black Friday Sales Data , 2018, 2018 IEEE 9th International Conference on Software Engineering and Service Science (ICSESS).

[20]  Samar Al-Saqqa,et al.  A Large-Scale Sentiment Data Classification for Online Reviews Under Apache Spark , 2018, EUSPN/ICTH.

[21]  Min Chen,et al.  User behaviour modeling, recommendations, and purchase prediction during shopping festivals , 2018, Electron. Mark..

[22]  Wael Etaiwi,et al.  Evaluation of classification algorithms for banking customer's behavior under Apache Spark Data Processing System , 2017, EUSPN/ICTH.

[23]  Yujie Zhang,et al.  Spark: A Big Data Processing Platform Based on Memory Computing , 2015, 2015 Seventh International Symposium on Parallel Architectures, Algorithms and Programming (PAAP).

[24]  Prakhar Mishra,et al.  Song year prediction using Apache Spark , 2016, 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[25]  Jian-hong Yu,et al.  Sales Forecast for Amazon Sales Based on Different Statistics Methodologies , 2016 .

[26]  Jia Guo,et al.  Smart-MLlib: A High-Performance Machine-Learning Library , 2016, 2016 IEEE International Conference on Cluster Computing (CLUSTER).

[27]  Michael I. Jordan,et al.  Machine learning: Trends, perspectives, and prospects , 2015, Science.

[28]  Selim Balcisoy,et al.  Using Behavioral Analytics to Predict Customer Invoice Payment , 2020, Big Data.

[29]  Akhan Akbulut,et al.  Benchmarking of Regression Algorithms and Time Series Analysis Techniques for Sales Forecasting , 2019, Balkan Journal of Electrical and Computer Engineering.

[30]  Guangchi Liu,et al.  Big data machine learning using apache spark MLlib , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[31]  Manal A. Abdel-Fattah,et al.  Predicting Potential Banking Customer Churn using Apache Spark ML and MLlib Packages: A Comparative Study , 2018 .