Machine Learning Driven Groundwater Quality Classification with Model Inter­pretability Using SHAP

Authors

  • Avni Yadav M.Tech Scholar, Vaishnavi Institutes of Technology and Science, Bhopal
  • Jayshree Boaddh HOD, CSE, Vaishnavi Institutes of Technology and Science, Bhopal
  • Rahul Patidar Asst. Prof., Vaishnavi Institutes of Technology and Science, Bhopal

DOI:

https://doi.org/10.69968/ijisem.2026v5i146-55

Keywords:

Groundwater Quality, Machine Learning, CatBoost, Irrigation Suitability, Binary Classification, SHAP

Abstract

The experimental evaluation of the suggested CatBoost-based groundwater quality classification system has proven discriminative power and positive generalization capacity. The model gets a score of 0.9730 accuracy, which implies that most samples of groundwater are classified correctly in either producing or not producing categories. This high accuracy is due to the model's very good ability to learn all the complicated nonlinear relationships among hydrochemical parameters and also to its performing well across the training and testing datasets. The F1-score of 0.9375 that was obtained additionally indicates a very good trade-off between precision and recall, which is a very crucial factor in groundwater quality assessment where the imbalance between classes is common and misclassification can come with serious environmental and agricultural risks. The analysis of the confusion matrix has reinforced these results by indicating that false negatives were very infrequently happening, thus making it less possible that unsatisfactory groundwater will be wrongly classified as good. This kind of dependability is very important for protecting the irrigation methods and for securing the health of the population. What is more, not only is the predictive performance powerful, but also the interpretability analysis via SHapley Additive exPlanations (SHAP) discloses that salinity-related parameters, sodium hazard indicators, groundwater level conditions, and dissolved constituents are the main drivers in deciding the groundwater suitability. The correlation of these significant features with established hydrogeochemical knowledge backs up the scientific reliability of the model. To sum up, the attained accuracy and F1-score along with the transparent interpretability confirm that the proposed system is very effective and also suitable for practical application in real-world groundwater quality management

References

[1] Allawi, M. F., Al-ani, Y., Jalal, A. D., Malik, Z., Sherif, M., & El-shafie, A. (2024). Groundwater quality parameters prediction based on data-driven models. Engineering Applications of Computational Fluid Mechanics. https://doi.org/10.1080/19942060.2024.2364749

[2] Apogba, J. N., Anornu, G. K., Koon, A. B., Dekongmen, B. W., Sunkari, E. D., Obed Fiifi Fynn e, F., & Kpiebaya, P. (2024). Application of machine learning techniques to predict groundwater quality in the Nabogo Basin, Northern Ghana. Heliyon, 10. https://doi.org/10.1016/j.heliyon.2024.e28527

[3] Aslam, B., Maqsoom, A., Cheema, A. L. I. H., Ullah, F., Alharbi, A., & Imran, M. (2022). Water Quality Management Using Hybrid Machine Learning and Data Mining Algorithms: An Indexing Approach. IEEE Access. https://doi.org/10.1109/ACCESS.2022.3221430

[4] Bakhtiarizadeh, A., Najafzadeh, M., & Mohamadi, S. (2024). Enhancement of groundwater resources quality prediction by machine learning models on the basis of an improved DRASTIC method. Scientific Reports.

[5] Chowdhury, T. N., Battamo, A., Nag, R., Zekker, I., & Salauddin, M. (2025). Impacts of climate change on groundwater quality: a systematic literature review of analytical models and machine learning techniques. Environmental Research Letters.

[6] Feng, F., Ghorbani, H., & Radwan, A. E. (2024). Predicting groundwater level using traditional and deep machine learning algorithms. Frontiers in Environmental Science. https://doi.org/10.3389/fenvs.2024.1291327

[7] Gad, M., Gaagai, A., Agrama, A. A., El-fiqy, W. F. M., Khadr, M., Abukhadra, M. R., Alfassam, H. E., Bellucci, S., & Ibrahim, H. (2024). Comprehensive evaluation and prediction of groundwater quality and risk indices using quantitative approaches, multivariate analysis, and machine learning models: An exploratory study. Heliyon, 10. https://doi.org/10.1016/j.heliyon. 2024.e36606

[8] García, E. M., López, M. I. M., Mateo, L. F., & Quijano, M. Á. (2025). Groundwater quality prediction for drinking and irrigation uses in the Murcia region (Spain) by artificial neural networks. Applied Water Science, 15. https://doi.org/10.1007/s13201-025-02605-z

[9] Haggerty, R., Sun, J., Yu, H., & Li, Y. (2023). Application of machine learning in groundwater quality modeling - A comprehensive review. Water Research, 233. https://doi.org/10.1016/j.watres.2023.119745

[10] Halalsheh, N., Ibrahim, M., Al-shanableh, N., Al-Harahsheh, S., & Al-Mashagbah, A. (2025). Prediction of water quality in Jordanian dams using data mining algorithms. Water Science & Technology, 92(10). https://doi.org/10.2166/wst.2025.158

[11] Holami, V. G., Haleghi, M. R. K., Eimouri, M. T., & Ahour, H. S. (2023). Prediction of annual groundwater depletion: An investigation of natural and anthropogenic inCuences. Journal of Earth System Science. https://doi.org/10.1007/s12040-023-02184-0

[12] Huang, X., Yao, R., Zhang, Y., Li, X., & Yu, Z. (2025). Data-driven prediction modeling of groundwater quality using integrated machine learning in Pinggu Basin, China Xun. Journal of Hydrology: Regional Studies.

[13] Islam, R., Sinha, A., Hussain, A., Deshmukh, K., & Usama, M. (2025). Integrated groundwater quality assessment using geochemical modelling and machine learning approach in Northern India. Scientific Reports.

[14] Khan, I., Nizam, S., Bamal, A., Majed, A., Nash, S., Olbert, A. I., & Uddin, G. (2025). Optimized intelligent learning for groundwater quality prediction in diverse aquifers of arid and semi-arid regions of India. Cleaner Engineering and Technology Journal, 26.

[15] Kolli, K., & Seshadri, R. (2013). Ground Water Quality Assessment using Data Mining Techniques. International Journal of Computer Applications, 76(15).

[16] Lokman, A., Ismail, W. Z. W., & 2, N. A. A. A. (2025). A Review of Water Quality Forecasting and Classification Using Machine Learning Models and Statistical Analysis. Water.

[17] Melesse, A. M., Khosravi, K., Tiefenbacher, J. P., Heddam, S., Kim, S., Mosavi, A., & Pham,

B. T. (2020). River Water Salinity Prediction Using Hybrid Machine Learning Models. Water.

[18] Priya, R., & Mallika, R. (2017). Ground Water Quality Modelling for Irrigation Using Data Mining Technique and Spatio-Temporal Dates. International Journal of Applied Engineering Research, 12(16).

[19] Raheja, H., Goel, A., & Pal, M. (2022). Prediction of groundwater quality indices using machine learning algorithms. Water Practice & Technology, 17(1). https://doi.org/10.2166/wpt.2021.120

[20] Sangwan, V., & Bhardwaj, R. (2024). Machine learning framework for predicting water quality classification. Water Practice & Technology, 19(11). https://doi.org/10.2166/wpt.2024.259

[21] Sekar, S., Surendran, S., Debajyoti, P., Perumal, M., Kumar, P., Eldin, H., Arumugam, B., Kamaraj, J., Upendra, B., & Jothimani, M. (2025). Machine learning-based prediction of seasonal groundwater quality for urbanized parts of Melur (Tamil Nadu), India. Results in Engineering, 28(r). https://doi.org/10.1016/j.rineng.2025.108222

[22] Siddiq, B., Javed, M. F., & Aldrees, A. (2025). Machine learning-driven surface water quality prediction: an intuitive GUI solution for forecasting TDS and DO levels. Water Quality Research Journal, 60(4). https://doi.org/10.2166/wqrj.2025.005

[23] Subudhi, S., Pati, A. K., Bose, S., Sahoo, S., Pattanaik, A., Acharya, B. M., & Thakur, R. R. (2025). Prediction of groundwater quality assessment by integrating boosted learning with DE optimizer. Scientific Reports.

[24] Tian, J., Yang, J., Liu, W., Zhang, M., & Daskalopoulou, K. (2025). Assessing groundwater quality for drinking and irrigation using hydrogeochemistry and machine learning in Northern China. Agricultural Water Management, 322. https://doi.org/10.1016/j.agwat.2025.109975

[25] Velmurugan, T., & Arunkumar, R. (2025). Quality based Analysis of Groundwater Data for the Performance of Classification Algorithms. IEEE.

Downloads

Published

14-01-2026

Issue

Section

Articles

How to Cite

[1]
Avni Yadav et al. 2026. Machine Learning Driven Groundwater Quality Classification with Model Inter­pretability Using SHAP. International Journal of Innovations in Science, Engineering And Management. 5, 1 (Jan. 2026), 46–55. DOI:https://doi.org/10.69968/ijisem.2026v5i146-55.