A review of systematic evaluation and improvement in the big data environment

The era of big data brings unprecedented opportunities and challenges to management research. As one of the important functions of management decision-making, evaluation has been given more functions and application space. Exploring the applicable evaluation methods in the big data environment has become an important subject of research. The purpose of this paper is to provide an overview and discussion of systematic evaluation and improvement in the big data environment. We first review the evaluation methods based on the main analytic techniques of big data such as data mining, statistical methods, optimization and simulation, and deep learning. Focused on the characteristics of big data (association feature, data loss, data noise, and visualization), the relevant evaluation methods are given. Furthermore, we explore the systematic improvement studies and application fields. Finally, we analyze the new application areas of evaluation methods and give the future directions of evaluation method research in a big data environment from six aspects. We hope our research could provide meaningful insights for subsequent research.

[1]  Nor Badrul Anuar,et al.  The role of big data in smart city , 2016, Int. J. Inf. Manag..

[2]  Lijing Jiang,et al.  A winner-take-all evaluation in data envelopment analysis , 2019, Ann. Oper. Res..

[3]  Daniel M. Ringel,et al.  Visualizing Asymmetric Competition Among More Than 1, 000 Products Using Big Search Data , 2016, Mark. Sci..

[4]  Sebastian Gibb,et al.  Visualization of proteomics data using R and Bioconductor , 2015, Proteomics.

[5]  Jieying Zhang,et al.  Do Client Characteristics Really Drive the Big N Audit Quality Effect? New Evidence from Propensity Score Matching , 2017, Manag. Sci..

[6]  Amit Mehra,et al.  Competitive Strategies for Brick-and-Mortar Stores to Counter 'Showrooming' , 2013, Manag. Sci..

[7]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[8]  Mansoor Rezghi,et al.  Noise-free principal component analysis: An efficient dimension reduction technique for high dimensional molecular data , 2014, Expert Syst. Appl..

[9]  David B. Dunson,et al.  Exploiting Big Data in Logistics Risk Assessment via Bayesian Nonparametrics , 2014, Oper. Res..

[10]  Sanjoy Ghose,et al.  An Analysis and Visualization Methodology for Identifying and Testing Market Structure , 2016, Mark. Sci..

[11]  Feng Yang,et al.  The optimal technology licensing strategy under supply disruption , 2018, Int. J. Prod. Res..

[12]  Chuck Zhang,et al.  An adaptive Bayesian approach for robust parameter design with observable time series noise factors , 2013 .

[13]  Dimitris Bertsimas,et al.  Inventory Management in the Era of Big Data , 2016 .

[14]  Uwe Aickelin,et al.  Imputation techniques on missing values in breast cancer treatment and fertility data , 2019, Health Information Science and Systems.

[15]  Bilal Zia,et al.  The Abcs of Financial Education: Experimental Evidence on Attitudes, Behavior, and Cognitive Biases , 2015, Manag. Sci..

[16]  Rabikar Chatterjee,et al.  Using Conditional Restricted Boltzmann Machines to Model Complex Consumer Shopping Patterns , 2019, Mark. Sci..

[17]  Hamidreza Zareipour,et al.  A New Feature Selection Technique for Load and Price Forecast of Electrical Power Systems , 2017, IEEE Transactions on Power Systems.

[18]  Timothy L Lash,et al.  Flying, phones and flu: Anonymized call records suggest that Keflavik International Airport introduced pandemic H1N1 into Iceland in 2009 , 2019, Influenza and other respiratory viruses.

[19]  Miklos A. Vasarhelyi,et al.  Predicting credit card delinquencies: An application of deep neural networks , 2018, Intell. Syst. Account. Finance Manag..

[20]  Hong Yang,et al.  Analysis of Traffic Crashes Involving Pedestrians Using Big Data: Investigation of Contributing Factors and Identification of Hotspots , 2017, Risk analysis : an official publication of the Society for Risk Analysis.

[21]  Jayant Kalagnanam,et al.  Managing Data Quality Risk in Accounting Information Systems , 2012, Inf. Syst. Res..

[22]  Daniel Berrar,et al.  SOINN+, a Self-Organizing Incremental Neural Network for Unsupervised Learning from Noisy Data Streams , 2020, Expert Syst. Appl..

[23]  Jianhua Gu,et al.  An integration approach of hybrid databases based on SQL in cloud computing environment , 2018, Softw. Pract. Exp..

[24]  A. Sonnenberg,et al.  Big data in gastroenterology research , 2014, Nature Reviews Gastroenterology &Hepatology.

[25]  Yang Li,et al.  Probabilistic Topic Model for Hybrid Recommender Systems: A Stochastic Variational Bayesian Approach , 2018, Mark. Sci..

[26]  Kirti Sharma,et al.  Performance Comparison of Machine Learning Platforms , 2019, INFORMS J. Comput..

[27]  Zhixing Cao,et al.  Accuracy of parameter estimation for auto-regulatory transcriptional feedback loops from noisy data , 2019, Journal of the Royal Society Interface.

[28]  Fabio Massacci,et al.  Security Events and Vulnerability Data for Cybersecurity Risk Estimation , 2017, Risk analysis : an official publication of the Society for Risk Analysis.

[29]  Paul P. Maglio,et al.  Data-Driven Understanding of Smart Service Systems Through Text Mining , 2018, Service Science.

[30]  Hing Kai Chan,et al.  Cascading Delay Risk of Airline Workforce Deployments with Crew Pairing and Schedule Optimization. , 2017, Risk analysis : an official publication of the Society for Risk Analysis.

[31]  El-Houssaine Aghezzaf,et al.  Temporal Big Data for Tactical Sales Forecasting in the Tire Industry , 2018, Interfaces.

[32]  Patrick Jaillet,et al.  Travel Time Estimation in the Age of Big Data , 2019, Oper. Res..

[33]  Bart De Schutter,et al.  A Big Data Analysis Approach for Rail Failure Risk Assessment , 2017, Risk analysis : an official publication of the Society for Risk Analysis.

[34]  AgrawalRakesh,et al.  Mining association rules between sets of items in large databases , 1993 .

[35]  Junjun Kong,et al.  Two-Period Pricing with Selling Effort in the Presence of Strategic Customers , 2019, Asia Pac. J. Oper. Res..

[36]  Jürgen Branke,et al.  Optimal Sampling for Simulated Annealing Under Noise , 2018, INFORMS J. Comput..

[37]  Witold Pedrycz,et al.  Soft set based association rule mining , 2016, Knowl. Based Syst..

[38]  John R. Hauser,et al.  Identifying Customer Needs from User-Generated Content , 2019, Mark. Sci..

[39]  Jan A. Van Mieghem,et al.  Clickstream Data and Inventory Management: Model and Empirical Analysis , 2014 .

[40]  Kees Jan Roodbergen,et al.  Improved Collaborative Transport Planning at Dutch Logistics Service Provider Fritom , 2016, Interfaces.

[41]  Thae Thae Han,et al.  Customer Churn Prediction using Association Rule Mining , 2019 .

[42]  Takayuki Yamada,et al.  Data mining based on clustering and association rule analysis for knowledge discovery in multiobjective topology optimization , 2019, Expert Syst. Appl..

[43]  Diana-Lucia Miholca,et al.  A novel concurrent relational association rule mining approach , 2019, Expert Syst. Appl..

[44]  Naveen Kumar,et al.  Detecting Review Manipulation on Online Platforms with Hierarchical Supervised Learning , 2018, J. Manag. Inf. Syst..

[45]  Antonio Moreno,et al.  The Operational Value of Social Media Information , 2018 .

[46]  Simon Elias Bibri,et al.  The IoT for smart sustainable cities of the future: An analytical framework for sensor-based big data applications for environmental sustainability , 2018 .

[47]  Svetlana Borovkova,et al.  An Ensemble of LSTM Neural Networks for High-Frequency Stock Market Classification , 2018, Journal of Forecasting.

[48]  Sumitra Mukherjee,et al.  Deep networks for predicting direction of change in foreign exchange rates , 2017, Intell. Syst. Account. Finance Manag..

[49]  Feng Yang,et al.  Collaborative distribution between two logistics service providers , 2016, Int. Trans. Oper. Res..

[50]  Harold Lehmann,et al.  Visualizing Central Line –Associated Blood Stream Infection (CLABSI) Outcome Data for Decision Making by Health Care Consumers and Practitioners—An Evaluation Study , 2013, Online journal of public health informatics.

[51]  Hongnian Yu,et al.  Mutual information based input feature selection for classification problems , 2012, Decis. Support Syst..

[52]  Liang Liang,et al.  Reserving relief supplies for earthquake: a multi-attribute decision making of China Red Cross , 2016, Ann. Oper. Res..

[53]  Tsan-Ming Choi,et al.  Big Data Analytics in Operations Management , 2018 .

[54]  S. Klüsener,et al.  Estimating men’s fertility from vital registration data with missing values , 2017, Population studies.

[55]  Diego Klabjan,et al.  Algorithms for Generalized Clusterwise Linear Regression , 2017, INFORMS J. Comput..

[56]  A. Chan,et al.  A fuzzy model for assessing the risk exposure of procuring infrastructure mega-projects through public-private partnership: The case of Hong Kong-Zhuhai-Macao Bridge , 2018 .

[57]  Hao Liu,et al.  An adaptive PMU missing data recovery method , 2020 .

[58]  Mark E. Ferguson,et al.  Estimation of Choice-Based Models Using Sales Data from a Single Firm , 2014, Manuf. Serv. Oper. Manag..

[59]  Gongbing Bi,et al.  Energy and Environmental Efficiency of China’s Transportation Sector: A Multidirectional Analysis Approach , 2014 .

[60]  Tonya Boone,et al.  Can Google Trends Improve Your Sales Forecast? , 2018 .

[61]  Xiabing Zheng,et al.  Understanding impulse buying in mobile commerce: An investigation into hedonic and utilitarian browsing , 2019, Int. J. Inf. Manag..

[62]  Xindong Zhang,et al.  Ordering Decision and Coordination of a Dual-Channel Supply Chain with Fairness Concerns Under an Online-to-Offline Model , 2019, Asia Pac. J. Oper. Res..

[63]  Vasant Dhar,et al.  Data science and prediction , 2012, CACM.

[64]  Zheng Yang,et al.  Forecasting the Production Abilities of Recycling Systems: A DEA Based Research , 2014, J. Appl. Math..

[65]  Upkar Varshney,et al.  Smart Health and Well-Being , 2016, Computer.

[66]  Beibei Li,et al.  Designing Ranking Systems for Hotels on Travel Search Engines by Mining User-Generated and Crowd-Sourced Content , 2011, Mark. Sci..

[67]  Kiran Adnan,et al.  An analytical study of information extraction from unstructured and multidimensional big data , 2019, Journal of Big Data.

[68]  Xiao Liu,et al.  A Structured Analysis of Unstructured Big Data by Leveraging Cloud Computing , 2015, Mark. Sci..

[69]  Subodha Kumar,et al.  The Interdependence of Data Analytics and Operations Management , 2017 .

[70]  Lorin M. Hitt,et al.  Data Analytics, Innovation, and Firm Productivity , 2020, Manag. Sci..

[71]  Vasant Dhar,et al.  Editorial - Big Data, Data Science, and Analytics: The Opportunity and Challenge for IS Research , 2014, Inf. Syst. Res..

[72]  Patricio Cumsille,et al.  4 Methods for Handling Missing Data , 2012 .

[73]  Reza Farzipoor Saen,et al.  Assessing sustainability of supply chains by double frontier network DEA: A big data approach , 2017, Comput. Oper. Res..

[74]  Timothy B. Patrick,et al.  Social Media, Big Data, and Public Health Informatics: Ruminating Behavior of Depression Revealed through Twitter , 2015, 2015 48th Hawaii International Conference on System Sciences.

[75]  Xiangjian He,et al.  Building an Intrusion Detection System Using a Filter-Based Feature Selection Algorithm , 2016, IEEE Transactions on Computers.

[76]  Rob Kitchin,et al.  Small data in the era of big data , 2015 .

[77]  Shengyong Chen,et al.  Weighted Multimodel Predictive Function Control for Automatic Train Operation System , 2014, J. Appl. Math..

[78]  Ishwarappa,et al.  A Brief Introduction on Big Data 5Vs Characteristics and Hadoop Technology , 2015 .

[79]  Lorin M. Hitt,et al.  Data Analytics Supports Decentralized Innovation , 2019, Manag. Sci..

[80]  Wei Wu,et al.  Evaluating methods for handling missing ordinal data in structural equation modeling , 2019, Behavior Research Methods.

[81]  Ruben Hoeksma,et al.  Optimal Mechanism Design for a Sequencing Problem with Two-Dimensional Types , 2016, Oper. Res..

[82]  Dan Simon,et al.  Evolutionary Optimization Algorithms , 2013 .

[83]  Gediminas Adomavicius,et al.  Classification, Ranking, and Top-K Stability of Recommendation Algorithms , 2016, INFORMS J. Comput..

[84]  L. V. Subramaniam,et al.  Mining Qualitative Attributes to Assess Corporate Performance , 2016 .

[85]  Z. Irani,et al.  Critical analysis of Big Data challenges and analytical methods , 2017 .

[86]  Chunxiao Jiang,et al.  Information Security in Big Data: Privacy and Data Mining , 2014, IEEE Access.

[87]  W. Philip Kegelmeyer,et al.  A hybrid deep learning architecture for classification of microscopic damage on National Ignition Facility laser optics , 2019, Stat. Anal. Data Min..

[88]  Liangfei Qiu,et al.  Understanding Voluntary Knowledge Provision and Content Contribution Through a Social-Media-Based Prediction Market: A Field Experiment , 2017, Inf. Syst. Res..

[89]  Rema Padman,et al.  The Impact of Privacy Regulation and Technology Incentives: The Case of Health Information Exchanges , 2016, Manag. Sci..

[90]  Yan Wang,et al.  A novel systematic algorithm paradigm for the electric vehicle data anomaly detection based on association data mining , 2019, Concurr. Comput. Pract. Exp..

[91]  Jens Hainmueller,et al.  Does Lean Improve Labor Standards? Management and Social Performance in the Nike Supply Chain , 2015 .

[92]  Rahul Telang,et al.  Saving Patient Ryan — Can Advanced Electronic Medical Records Make Patient Care Safer? , 2017 .

[93]  Susan D. Moffatt-Bruce,et al.  The Impact of Combining Conformance and Experiential Quality on Hospitals’ Readmissions and Cost Performance , 2014, Manag. Sci..

[94]  Fei-Yue Wang,et al.  Traffic Flow Prediction With Big Data: A Deep Learning Approach , 2015, IEEE Transactions on Intelligent Transportation Systems.

[95]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[96]  Barry L. Nelson,et al.  Online Risk Monitoring Using Offline Simulation , 2019, INFORMS J. Comput..

[97]  Dimitrios Hatzinakos,et al.  Analytic alpha-stable noise modeling in a Poisson field of interferers or scatterers , 1998, IEEE Trans. Signal Process..

[98]  A. Gunasekaran,et al.  Big data analytics in logistics and supply chain management: Certain investigations for research and applications , 2016 .

[99]  Suzanna Long,et al.  A model for the evaluation of environmental impact indicators for a sustainable maritime transportation systems , 2019, Frontiers of Engineering Management.

[100]  Shahriar Akter,et al.  Big data analytics in E-commerce: a systematic review and agenda for future research , 2016, Electronic Markets.

[101]  Y. Lou,et al.  Estimation of causal effects in clinical endpoint bioequivalence studies in the presence of intercurrent events: noncompliance and missing data , 2018, Journal of biopharmaceutical statistics.

[102]  D. Ruths,et al.  Social media for large studies of behavior , 2014, Science.

[103]  Deniz Yuret,et al.  Transfer Learning for Low-Resource Neural Machine Translation , 2016, EMNLP.

[104]  Guangming Shi,et al.  Mixed Noise Removal via Laplacian Scale Mixture Modeling and Nonlocal Low-Rank Approximation , 2017, IEEE Transactions on Image Processing.

[105]  S. Parkinson,et al.  Auditing file system permissions using association rule mining , 2016, Expert Syst. Appl..

[106]  Panagiotis Adamopoulos,et al.  The Impact of User Personality Traits on Word of Mouth: Text-Mining Social Media Platforms , 2018, Inf. Syst. Res..

[107]  Robert Hoehndorf,et al.  Prediction of Metabolic Pathway Involvement in Prokaryotic UniProtKB Data by Association Rule Mining , 2016, PloS one.

[108]  Barry L. Nelson,et al.  Gaussian Markov Random Fields for Discrete Optimization via Simulation: Framework and Algorithms , 2019, Oper. Res..

[109]  Mahmoud-Reza Haghifam,et al.  Quadratic optimization method for a dual index combination of the penetration level and the dispersion factor of the distributed generation , 2018 .

[110]  David P. Kopcso,et al.  Case Article - Business Value in Integrating Predictive and Prescriptive Analytics Models , 2018, INFORMS Trans. Educ..

[111]  Yonggang Wen,et al.  Toward Scalable Systems for Big Data Analytics: A Technology Tutorial , 2014, IEEE Access.

[112]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[113]  Enrico Biffis,et al.  Satellite Data and Machine Learning for Weather Risk Management and Food Security , 2017, Risk analysis : an official publication of the Society for Risk Analysis.

[114]  Felipe Caro,et al.  The 2nd POMS Applied Research Challenge 2016 Awards , 2016 .

[115]  Sudeep Bhatia,et al.  Predicting Risk Perception: New Insights from Data Science , 2019, Manag. Sci..

[116]  Hing Kai Chan,et al.  Recent Development in Big Data Analytics for Business Operations and Risk Management , 2017, IEEE Transactions on Cybernetics.

[117]  Réjean Samson,et al.  Statistical estimation of missing data in life cycle inventory: an application to hydroelectric power plants , 2012 .

[118]  Yan Huang,et al.  'Level Up': Leveraging Skill and Engagement to Maximize Player Game-Play in Online Video Games , 2018 .

[119]  Vijay S. Mookerjee,et al.  Impact of Recommender System on Competition Between Personalizing and Non-Personalizing Firms , 2015, J. Manag. Inf. Syst..

[120]  Kay Giesecke,et al.  Risk Analysis for Large Pools of Loans , 2015, Manag. Sci..

[121]  Patrick Jaillet,et al.  Online Vehicle Routing: The Edge of Optimization in Large-Scale Applications , 2019, Oper. Res..

[122]  John X. Jiang,et al.  Revolving Rating Analysts and Ratings of Mortgage-Backed and Asset-Backed Securities: Evidence from LinkedIn , 2018, Manag. Sci..

[123]  Riitta Salmelin,et al.  Post-hoc modification of linear models: Combining machine learning with domain information to make solid inferences from noisy data , 2019, NeuroImage.

[124]  Feng Yang,et al.  Capacity investment under cost sharing contracts , 2017 .

[125]  Sudip Bhattacharjee,et al.  Growth Projections and Assortment Planning of Commodity Products Across Multiple Stores: A Data Mining and Optimization Approach , 2015, INFORMS J. Comput..

[126]  Zhifang Zhou,et al.  The Building of Papermaking Enterprise’s Recycling Economy Evaluation Index System Based on Value Flow Analysis , 2016 .

[127]  Asad J. Khattak,et al.  How big data serves for freight safety management at highway-rail grade crossings? A spatial approach fused with path analysis , 2016, Neurocomputing.

[128]  Rhoda C. Joseph,et al.  Big Data and Transformational Government , 2013, IT Professional.

[129]  Narayan Ramasubbu,et al.  Technical Debt and the Reliability of Enterprise Software Systems: A Competing Risks Analysis , 2015, Manag. Sci..

[130]  Andries Petrus Engelbrecht,et al.  Positive-versus-Negative Classification for Model Aggregation in Predictive Data Mining , 2013, INFORMS J. Comput..

[131]  Xiaogang Wang,et al.  An integration of UPLC-DAD/ESI-Q-TOF MS, GC-MS, and PCA analysis for quality evaluation and identification of cultivars of Chrysanthemi Flos (Juhua). , 2019, Phytomedicine : international journal of phytotherapy and phytopharmacology.

[132]  Jesse C. Bockstedt,et al.  Relative Privacy Valuations Under Varying Disclosure Characteristics , 2019, Inf. Syst. Res..

[133]  Vijay S. Mookerjee,et al.  Optimizing Performance-Based Internet Advertisement Campaigns , 2017, Oper. Res..

[134]  Dorit S. Hochbaum Machine Learning and Data Mining with Combinatorial Optimization Algorithms , 2018 .

[135]  Jinyin Chen,et al.  Comprehensive Evaluation of the Postharvest Antioxidant Capacity of Majiayou Pomelo Harvested at Different Maturities Based on PCA , 2019, Antioxidants.

[136]  Lakshminarayanan Subramanian,et al.  A Model-Based Embedding Technique for Segmenting Customers , 2018, Oper. Res..

[137]  Thomas A. Weber,et al.  Dynamic Valuation of Delinquent Credit-Card Accounts , 2015, Manag. Sci..

[138]  Xiaodan Li,et al.  Assessing the quality of information on wikipedia: A deep‐learning approach , 2019, J. Assoc. Inf. Sci. Technol..

[139]  C. L. Philip Chen,et al.  Data-intensive applications, challenges, techniques and technologies: A survey on Big Data , 2014, Inf. Sci..

[140]  Hemantkumar Wani,et al.  Big data in supply chain management , 2017, 2017 4th International Conference on Advanced Computing and Communication Systems (ICACCS).

[141]  Peter Géczy,et al.  BIG DATA CHARACTERISTICS , 2014 .

[142]  Yang Zhao,et al.  An improved association rule mining-based method for revealing operational problems of building heating, ventilation and air conditioning (HVAC) systems , 2019, Applied Energy.

[143]  George Skiadopoulos,et al.  A New Predictor of U.S. Real Economic Activity: The S&P 500 Option Implied Risk Aversion , 2019, Manag. Sci..

[144]  Shahriar Akter,et al.  How ‘Big Data’ Can Make Big Impact: Findings from a Systematic Review and a Longitudinal Case Study , 2015 .

[145]  Hao Yu,et al.  A Bayesian vector autoregression-based data analytics approach to enable irregularly-spaced mixed-frequency traffic collision data imputation with missing values , 2019, Transportation Research Part C: Emerging Technologies.

[146]  Wei Huang,et al.  SMAA-PO: project portfolio optimization problems based on stochastic multicriteria acceptability analysis , 2015, Ann. Oper. Res..

[147]  So Young Sohn,et al.  Analyzing research trends in personal information privacy using topic modeling , 2017, Comput. Secur..

[148]  Rossitza Setchi,et al.  Feature selection using Joint Mutual Information Maximisation , 2015, Expert Syst. Appl..