Improving the outlier detection method in concrete mix design by combining the isolation forest and local outlier factor

Abstract The rapid development in construction industry, induce a large amounts of concrete data that are usually measured and analyzed everyday naming that concrete is the second usable material on earth. Concrete is made from numerous ingredients that have huge variability either at the design stage or at the testing stage. The main goal of this paper is to quantify the anomalies and outliers during the design phase of concrete mixtures. Concrete mixtures have various percentages of ingredients such as cement, slag, fly ash, water, superplasticizer, fine and coarse aggregates. Machine learning and data mining is considered a very thriving topic in many research fields and its implementation in the construction industry still limited. Concrete community is in need for such a tool to produce an efficient way to efficiently design concrete mixtures. Outliers could occur during the evaluation of samples’ measurements that might include human or system errors. The Local Outlier Factor (LOF) algorithm is the most common method used to determine outliers, however, the LOF has some challenges. In this paper, an anomaly-based outlier detection algorithm called Isolation Forest based on a Sliding window for the Local Outlier Factor (IFS-LOF) algorithm, is proposed to solve the limitations of the LOF in evaluating 1030 concrete mixtures. The proposed algorithm works without any previous knowledge of data distribution and executes the process within limited memory and with minimal computational effort. The evaluation of results proved that the IFS-LOF algorithm is more efficient in detecting the sequence of outliers and provided more efficient accuracy that other state of the art LOF algorithms.

[1]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[2]  Maurizio Filippone,et al.  A comparative evaluation of outlier detection algorithms: Experiments and analyses , 2018, Pattern Recognit..

[3]  Yi Peng,et al.  Data mining in the construction industry: Present status, opportunities, and future trends , 2020 .

[4]  Mingchao Li,et al.  Multiple mechanical properties prediction of hydraulic concrete in the form of combined damming by experimental data mining , 2019, Construction and Building Materials.

[5]  Tarig A. Ali,et al.  Investigation of relationships between high strength self consolidating concrete compressive strength and macroscopic internal structure , 2013 .

[6]  Eamonn J. Keogh,et al.  Data Editing Techniques to Allow the Application of Distance-Based Outlier Detection to Streams , 2010, 2010 IEEE International Conference on Data Mining.

[7]  Hongzhi Wang,et al.  Progress in Outlier Detection Techniques: A Survey , 2019, IEEE Access.

[8]  Jian Tang,et al.  Enhancing Effectiveness of Outlier Detections for Low Density Patterns , 2002, PAKDD.

[9]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[10]  Zhi-Hua Zhou,et al.  Isolation Forest , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[11]  Jordi Vives i Costa,et al.  Numerical model for a nineteenth-century hydrometric module , 2019 .

[12]  I-Cheng Yeh,et al.  Modeling of strength of high-performance concrete using artificial neural networks , 1998 .

[13]  Minrui Fei,et al.  An Anomaly Detection Approach Based on Isolation Forest Algorithm for Streaming Data Using Sliding Window , 2013, ICONS.

[14]  Hassan El-Chabib,et al.  The performance of high-strength flowable concrete made with binary, ternary, or quaternary binder in hot climate , 2013 .

[15]  Benjamin C. M. Fung,et al.  Advances and challenges in building engineering and data mining applications for energy-efficient communities , 2016 .

[16]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[17]  Rajeev Rastogi,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD 2000.

[18]  Lei Cao,et al.  Scalable Top-n Local Outlier Detection , 2017, KDD.

[19]  Zhangyu Cheng,et al.  Outlier detection using isolation forest and local outlier factor , 2019, RACS.

[20]  Enad Mahmoud,et al.  Fresh, Mechanical, and Durability Characteristics of Self-Consolidating Concrete Incorporating Recycled Asphalt Pavements , 2014 .

[21]  Wei Liu,et al.  Distance-based k-nearest neighbors outlier detection method in large-scale traffic data , 2015, 2015 IEEE International Conference on Digital Signal Processing (DSP).

[22]  Raymond T. Ng,et al.  Algorithms for Mining Distance-Based Outliers in Large Datasets , 1998, VLDB.