Corrosion in oil wells results from the complex interaction between various factors such as chemical composition, temperature, pressure, and flow rates of the fluids involved, and poses a significant challenge to the oil and gas industry. Accurate prediction of corrosion rates is essential for operational efficiency, proactive maintenance, and risk mitigation. Traditional methods for corrosion rate prediction rely on empirical models, which are often reactive and inaccurate. Machine learning models can provide a more accurate prediction of corrosion rate1, 2 . The objective of this study is to develop a hybrid model that leverages the physics-based understanding of corrosion processes (first principles) along with the data-driven capabilities of machine learning. By merging these two approaches, we seek to improve the accuracy and adaptability of corrosion rate predictions to enable timely maintenance, reduce downtime, and enhance safety and operational efficiency. Our methodology started with the collection of historical data on corrosion rates, well operational parameters, pipe parameters, and fluid compositions from a representative set of oil wells (over 1,000 records). We cleaned and transformed the data, handling missing values and outliers appropriately. To integrate first principles, we employed the corrosion model NORSOK3 , based on fundamental chemical and physical principles, to calculate theoretical corrosion rates. Then, machine learning models, including Random Forest, Gradient Boosting, and Support Vector Machines, were trained on the same dataset to predict corrosion rates, and the best-performing model was selected using metrics such as Mean Absolute Error (MAE) and Root Mean Square Error (RMSE). Finally, we combined the predictions from the first principles and machine learning model using an ensemble approach. The resulting hybrid model was then evaluated on a testing dataset.

You do not currently have access to this content.