XGBoost for Heart Disease Prediction Achieving High Accuracy with Robust Machine Learning Techniques
DOI:
https://doi.org/10.69968/ijisem.2025v4i3185-191Keywords:
Heart disease, Machine learning, XGBoost, Data integration, Coronary artery disease, Predictive modeling, Early diagnosisAbstract
The comprehensive dataset on heart disease presented in this study consists of 1190 cases with 11 shared characteristics from five well-known datasets: Cleveland, Hungarian, Switzerland, Long Beach, Virginia, and Statlog. Because of this, it is the biggest dataset of its kind for studies on coronary artery disease (CAD). To aid in early detection, a robust machine learning model that could reliably forecast cardiac illness needed to be developed. To eliminate null values and divide the dataset into an 80:20 train-test ratio, we employed exploratory data analysis. To ensure that the characteristics were consistent, we also employed conventional scaling. Logistic Regression, Decision Tree, Random Forest, Support Vector Machine, K-Nearest Neighbours, Gradient Boosting, AdaBoost, and XGBoost were the eight machine learning methods that we examined. Optimized using grid search with 5-fold cross-validation, XGBoost performed the best with test accuracy of 0.966, precision of 0.967, and recall of 0.966. Three false positives and one false negative could be distinguished by it. The approach may be helpful in clinical settings, as evidenced by its high recall for positive cases (0.986). By providing us with a new dataset and an effective predictive model, this work advances the diagnosis of CAD. This makes it possible to identify and treat CAD earlier.
References
[1] Pan, Y., Fu, M., Cheng, B., Tao, X. & Guo, J. Enhanced deep learning assisted convolutional neural network for heart disease prediction on the internet of medical things platform. IEEE Access 8, 189503-189512.https://doi.org/10.1109/ACCESS.2020.3026214
[2] Teja, M Darshan, and G Mokesh Rayalu. "Optimizing heart disease diagnosis with advanced machine learning models: a comparison of predictive performance." BMC cardiovascular disorders vol. 25,1 212. 22 Mar. 2025,https://doi.org/10.1186/s12872-025-04627-6
[3] Rohan, D., Reddy, G.P., Kumar, Y.V.P. et al. An extensive experimental analysis for heart disease prediction using artificial intelligence techniques. Sci Rep 15, 6132 (2025).https://doi.org/10.1038/s41598-025-90530-1
[4] Sourov, Md Emon Akter, et al. "An explainable ai-enhanced machine learning approach for cardiovascular disease detection and risk assessment." arXiv preprint arXiv:2507.11185 (2025).
[5] Anjaneyulu, M., et al. "Effective heart disease prediction using hybrid machine learning techniques." ADVANCEMENTS IN AEROMECHANICAL MATERIALS FOR MANUFACTURING: ICAAMM-2021 2492.1 (2023): 030070.https://doi.org/10.1063/5.0114370
[6] Waigi, R.; Choudhary, S.; Fulzele, P.; Mishra, G. Predicting the risk of heart disease using advanced machine learning approach. Eur. J. Mol. Clin. Med. 2020, 7, 1638-1645.
[7] Breiman, L. Random forests. Mach. Learn. 2001, 45, 5-32.https://doi.org/10.1023/A:1010933404324
[8] Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the KDD '16: 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13-17 August 2016; Association for Computing Machinery: New York, NY, USA, 2016; pp. 785-794.https://doi.org/10.1145/2939672.2939785
[9] Gietzelt, M.; Wolf, K.-H.; Marschollek, M.; Haux, R. Performance comparison of accelerometer calibration algorithms based on 3D-ellipsoid fitting methods. Comput. Methods Programs Biomed. 2013, 111, 62-71.https://doi.org/10.1016/j.cmpb.2013.03.006
[10] K, V.; Singaraju, J. Decision Support System for Congenital Heart Disease Diagnosis based on Signs and Symptoms using Neural Networks. Int. J. Comput. Appl. 2011, 19, 6-12.https://doi.org/10.5120/2368-3115
[11] Alotaibi, F.S. Implementation of Machine Learning Model to Predict Heart Failure Disease. Int. J. Adv. Comput. Sci. Appl. 2019, 10, 261-268.https://doi.org/10.14569/IJACSA.2019.0100637
[12] Hasan, N.; Bao, Y. Comparing different feature selection algorithms for cardiovascular disease prediction. Health Technol. 2020, 11, 49-62.https://doi.org/10.1007/s12553-020-00499-2
[13] Narin, A.; Isler, Y.; Ozer, M. Early prediction of Paroxysmal Atrial Fibrillation using frequency domain measures of heart rate variability. In Proceedings of the 2016 Medical Technologies National Congress (TIPTEKNO), Antalya, Turkey, 27-29 October 2016.https://doi.org/10.1109/TIPTEKNO.2016.7863110
[14] Shah, D.; Patel, S.; Bharti, S.K. Heart Disease Prediction using Machine Learning Techniques. SN Comput. Sci. 2020, 1, 345.https://doi.org/10.1007/s42979-020-00365-y
[15] Drożdż, K.; Nabrdalik, K.; Kwiendacz, H.; Hendel, M.; Olejarz, A.; Tomasik, A.; Bartman, W.; Nalepa, J.; Gumprecht, J.; Lip, G.Y.H. Risk factors for cardiovascular disease in patients with metabolic-associated fatty liver disease: A machine learning approach. Cardiovasc. Diabetol. 2022, 21, 240.https://doi.org/10.1186/s12933-022-01672-9
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Ashish Kumar Parashar, Anita Jamliya, Sama Nasrat, Rajesh Soni

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Re-users must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use. This license allows for redistribution, commercial and non-commercial, as long as the original work is properly credited.