Gradient Boosted Decision Trees for Lithology Classification

Abstract The classification of underground formation lithology is crucial for petroleum exploration and engineering as it is the basis of geological research studies and reservoir parameter calculations. Hence, there have recently been increased efforts to automate lithology classification. This is due to the rising prowess of cheap computational devices and availability of open source machine learning software libraries. This has opened avenues for the efficient analysis of large volumes of well log data with much higher accuracy. In this regard, efforts were made recently to evaluate machine learning methods to classify formation lithology by using data from Daniudui gas field (DGF) and Hanginqi gas field (HGF). Although the machine learning algorithms utilized in the studies performed well, there is still scope for improvement in the predictive ability and scalability. The results obtained from the boosted decision tree learners, in these studies, were encouraging. Hence, we tapped into the state of the art of the boosting approach to machine learning and implemented algorithms that are scalable to large datasets. Specifically, we applied, XGBoost, LightGBM and CatBoost, which belong to the family of gradient boosted decision trees (GBDTs). We compared their performance, after combining well log data obtained from DGF and HGF, with other tree-based machine algorithms, namely, decision trees (DTs), random forests (RFs), extremely randomized trees (ERTs), AdaBoost and gradient boosting machines (GBMs). We tuned the hyperparameters and then evaluated the generated models using metrics such as the micro average, macro average and weighted average of precision (Pr), recall (Re) and F1-score (F1) on the test set. In our analysis, amongst the applied algorithms, we found that LightGBM possessed the highest metrics. Our work identifies LightGBM and CatBoost as good first-choice algorithms for the supervised classification of lithology when utilizing well log data.

[1]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[2]  Larry W. Lake,et al.  Fast evaluation of well placements in heterogeneous reservoir models using machine learning , 2018 .

[3]  Mohammad Ali Sebtosheikh,et al.  Support vector machine method, a new technique for lithology prediction in an Iranian heterogeneous carbonate reservoir using petrophysical well logs , 2015, Carbonates and Evaporites.

[4]  Lior Rokach,et al.  Ensemble learning: A survey , 2018, WIREs Data Mining Knowl. Discov..

[5]  Mario R. Eden,et al.  Evaluating the Boosting Approach to Machine Learning for Formation Lithology Classification , 2018 .

[6]  Seyyed Mohsen Salehi,et al.  Automatic Identification of Formation Iithology from Well Log Data: A Machine Learning Approach , 2014 .

[7]  Trevor Hastie,et al.  Multi-class AdaBoost ∗ , 2009 .

[8]  Deok-Kee Choi,et al.  Data-Driven Materials Modeling with XGBoost Algorithm and Statistical Inference Analysis for Prediction of Fatigue Strength of Steels , 2019, International Journal of Precision Engineering and Manufacturing.

[9]  Lior Rokach,et al.  Decision forest: Twenty years of research , 2016, Inf. Fusion.

[10]  Tie-Yan Liu,et al.  LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.

[11]  Chukwuma Onwuchekwa,et al.  Application of Machine Learning Ideas to Reservoir Fluid Properties Estimation , 2018, All Days.

[12]  Anna Veronika Dorogush,et al.  CatBoost: unbiased boosting with categorical features , 2017, NeurIPS.

[13]  Wen Zhou,et al.  Evaluation of machine learning methods for formation lithology identification: A comparison of tuning processes and model performances , 2018 .