Evolving Neural Conditional Random Fields for drilling report classification

Abstract Oil and gas prospecting is an important economic activity, besides being expensive and quite complex, thus requiring close monitoring to avoid work accidents and mainly environmental damages. An essential source of information concerns the daily drilling reports that contain operations technical interpretations and additional information from rig sensors. However, only a few works have focused on mining textual information from such reports for providing intelligent-based decision-making mechanisms to aid safety and efficiency concerns in drilling operations. This work proposes a contextual-driven approach based on Recurrent Neural Networks to recognize events in drilling reports that can outperform other related techniques. We also introduce a novel approach based on evolutionary computing to combine partially trained models using cyclical learning rates. Experiments conducted on two unbalanced datasets provided by Petrobras (Petroleo Brasileiro S.A.) show that our model improved Macro-F1 scores over the baseline by more than 47%. Besides, the proposed ensembling technique further enhanced these values by another 3% in the best scenario. Such promising results can shed light over new research directions in the field. 1

[1]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[2]  Zheren Ma,et al.  Applications of Machine Learning and Data Mining in SpeedWise® Drilling Analytics: A Case Study , 2018, Day 2 Tue, November 13, 2018.

[3]  Leslie N. Smith,et al.  No More Pesky Learning Rate Guessing Games , 2015, ArXiv.

[4]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[5]  David E. Goldberg,et al.  Genetic algorithms and Machine Learning , 1988, Machine Learning.

[6]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[7]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[8]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[9]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[10]  Dmitry Koroteev,et al.  Application of machine learning to accidents detection at directional drilling , 2019, Journal of Petroleum Science and Engineering.

[11]  Iryna Gurevych,et al.  Reporting Score Distributions Makes a Difference: Performance Study of LSTM-networks for Sequence Tagging , 2017, EMNLP.

[12]  C. I. Noshi,et al.  The Role of Machine Learning in Drilling Operations; A Review , 2018 .

[13]  David Castiñeira,et al.  Machine Learning and Natural Language Processing for Automated Analysis of Drilling and Completion Data , 2018 .

[14]  Avinash Wesley,et al.  Sequence Mining and Pattern Analysis in Drilling Reports with Deep Natural Language Processing , 2017, Day 3 Wed, September 26, 2018.

[15]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[16]  João Paulo Papa,et al.  Efficient supervised optimum-path forest classification for large datasets , 2012, Pattern Recognit..

[17]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[18]  Zhen Nie,et al.  Predicting seismic-based risk of lost circulation using machine learning , 2019, Journal of Petroleum Science and Engineering.

[19]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[20]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[21]  S. Arumugam,et al.  Revealing Patterns within the Drilling Reports Using Text Mining Techniques for Efficient Knowledge Management , 2016 .

[22]  Kilian Q. Weinberger,et al.  Snapshot Ensembles: Train 1, get M for free , 2017, ICLR.

[23]  Hongqi Li,et al.  The linear random forest algorithm and its advantages in machine learning assisted logging regression modeling , 2019, Journal of Petroleum Science and Engineering.

[24]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[25]  Xin-She Yang,et al.  LibOPT: An Open-Source Platform for Fast Prototyping Soft Optimization Techniques , 2017, ArXiv.

[26]  Eduard H. Hovy,et al.  End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.

[27]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[28]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[29]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[30]  João Paulo Papa,et al.  Supervised pattern classification based on optimum-path forest , 2009 .