The success of machine learning (ML) techniques implemented in different industries heavily rely on operator expertise and domain knowledge, which is used in manually choosing an algorithm and setting up the specific algorithm parameters for a problem. Due to the manual nature of model selection and parameter tuning, it is impossible to quantify or evaluate the quality of this manual process, which in turn limits the ability to perform comparison studies between different algorithms. In this study, we propose a new hybrid approach for developing machine learning workflows to help automated algorithm selection and hyperparameter optimization. The proposed approach provides a robust, reproducible, and unbiased workflow that can be quantified and validated using different scoring metrics. We have used the most common workflows implemented in the application of artificial intelligence (AI) and ML in engineering problems including grid/random search, Bayesian search and optimization, genetic programming, and compared that with our new hybrid approach that includes the integration of Tree-based Pipeline Optimization Tool (TPOT) and Bayesian optimization. The performance of each workflow is quantified using different scoring metrics such as Pearson correlation (i.e., R2 correlation) and Mean Square Error (i.e., MSE). For this purpose, actual field data obtained from 1567 gas wells in Marcellus Shale, with 121 features from reservoir, drilling, completion, stimulation, and operation is tested using different proposed workflows. A proposed new hybrid workflow is then used to evaluate the type well used for evaluation of Marcellus shale gas production. In conclusion, our automated hybrid approach showed significant improvement in comparison to other proposed workflows using both scoring matrices. The new hybrid approach provides a practical tool that supports the automated model and hyperparameter selection, which is tested using real field data that can be implemented in solving different engineering problems using artificial intelligence and machine learning. The new hybrid model is tested in a real field and compared with conventional type wells developed by field engineers. It is found that the type well of the field is very close to P50 predictions of the field, which shows great success in the completion design of the field performed by field engineers. It also shows that the field average production could have been improved by 8% if shorter cluster spacing and higher proppant loading per cluster were used during the frac jobs.
[1]
Russ B. Altman,et al.
Missing value estimation methods for DNA microarrays
,
2001,
Bioinform..
[2]
David D. Cox,et al.
Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures
,
2013,
ICML.
[3]
Jonas Mockus,et al.
Application of Bayesian approach to numerical methods of global and stochastic optimization
,
1994,
J. Glob. Optim..
[4]
Mehdi A. Khazaeli,et al.
Drilling performance monitoring and optimization: a data-driven approach
,
2019,
Journal of Petroleum Exploration and Production Technology.
[5]
Roman Garnett,et al.
Bayesian optimization for automated model selection
,
2016,
NIPS.
[6]
B. Shubert.
A Sequential Method Seeking the Global Maximum of a Function
,
1972
.
[7]
S. N. Sivanandam,et al.
Introduction to genetic algorithms
,
2007
.
[8]
Randal S. Olson,et al.
Considerations for automated machine learning in clinical metabolic profiling: Altered homocysteine plasma concentration associated with metformin exposure
,
2017,
PSB.
[9]
Randal S. Olson,et al.
Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science
,
2016,
GECCO.