Accurate PM2.5 Prediction Using Machine Learning Ensembles for Sustainable Smart Cities
DOI:
https://doi.org/10.69968/ijisem.2025v4i410-17Keywords:
Air quality prediction, PM2.5, machine learning, hybrid ensemble, Random Forest, Extra Trees, smart cities, sustainability, AQI, IndiaAbstract
Particularly in India, where cities like Delhi suffer from serious air quality problems, air pollution is a major obstacle to urban sustainability and requires precise prediction models to help smart city projects. Using the Central Pollution Control Board's Air Quality Data in India (2015–2020) dataset, this study, Air Quality Prediction for Sustainable Smart Cities using Machine Learning, creates a reliable framework for predicting PM2.5 concentrations across 26 Indian cities. The dataset was optimised for modelling by careful preparation, which included one-hot encoding of city variables, IQR-based outlier treatment, SimpleImputer for missing values, and exploratory data analysis to find trends. Support Vector Regressor, Gradient Boosting Regressor, Random Forest Regressor, and Extra Trees Regressor were the four machine learning models that were trained. A hybrid ensemble that combined Random Forest and Extra Trees through a voting mechanism performed better (R2 = 0.9818, RMSE = 11.8222 µg/m³). The model showed resilience in a variety of metropolitan settings, outperforming baseline models by 0.3 to 11.88% in R2 values for Hyderabad, Bengaluru, Kolkata, and Delhi. With the potential for worldwide use, this system facilitates real-time air quality management by providing precise AQI derivation and visualisation dashboards, improving environmental sustainability, urban planning, and public health in smart cities.
References
[1] World Health Organization. "Air Quality and Health." WHO, 2021, www.who.int/health-topics/air-pollution#tab=tab_1.
[2] Natarajan, Suresh Kumar, et al. "Optimized machine learning model for air quality index prediction in major cities in India." Scientific reports 14.1 (2024): 6795.https://doi.org/10.1038/s41598-024-54807-1
[3] Ravindiran, Gokulan, et al. "Air quality prediction by machine learning models: A predictive study on the indian coastal city of Visakhapatnam." Chemosphere 338 (2023): 139518.https://doi.org/10.1016/j.chemosphere.2023.139518
[4] Rautela, Kuldeep Singh, and Manish Kumar Goyal. "Transforming air pollution management in India with AI and machine learning technologies." Scientific Reports 14.1 (2024): 20412.https://doi.org/10.1038/s41598-024-71269-7
[5] Ganguli, Isha, et al. "Comprehensive Analysis of Air Quality Trends in India Using Machine Learning and Deep Learning Models." Proceedings of the 26th International Conference on Distributed Computing and Networking. 2025.https://doi.org/10.1145/3700838.3703681
[6] Rosero-Montalvo, Paul D., et al. "Air pollution monitoring using WSN nodes with machine learning techniques: A case study." Logic Journal of the IGPL 30.4 (2022): 599-610.https://doi.org/10.1093/jigpal/jzab005
[7] Zhao, Bu, et al. "Urban air pollution mapping using fleet vehicles as mobile monitors and machine learning." Environmental Science & Technology 55.8 (2021): 5579-5588.https://doi.org/10.1021/acs.est.0c08034
[8] Heidari, Arash, et al. "A reliable method for data aggregation on the industrial internet of things using a hybrid optimization algorithm and density correlation degree." Cluster Computing 27.6 (2024): 7521-7539.https://doi.org/10.1007/s10586-024-04351-4
[9] Xie, Xiaoliang, et al. "Bayesian network reasoning and machine learning with multiple data features: air pollution risk monitoring and early warning." Natural Hazards 107.3 (2021): 2555-2572.https://doi.org/10.1007/s11069-021-04504-3
[10] Song, Zigeng, et al. "Satellite retrieval of air pollution changes in central and Eastern China during COVID-19 lockdown based on a machine learning model." Remote Sensing 13.13 (2021): 2525.https://doi.org/10.3390/rs13132525
[11] Adams, Matthew D., et al. "Spatial modelling of particulate matter air pollution sensor measurements collected by community scientists while cycling, land use regression with spatial cross-validation, and applications of machine learning for data correction." Atmospheric Environment 230 (2020): 117479.https://doi.org/10.1016/j.atmosenv.2020.117479
[12] Li, Tianshuai, et al. "Contributions of various driving factors to air pollution events: Interpretability analysis from Machine learning perspective." Environment International 173 (2023): 107861.https://doi.org/10.1016/j.envint.2023.107861
[13] Wijnands, Jasper S., et al. "The impact of the COVID-19 pandemic on air pollution: A global assessment using machine learning techniques." Atmospheric Pollution Research 13.6 (2022): 101438.https://doi.org/10.1016/j.apr.2022.101438
[14] Habeebullah, Turki M., et al. "Modelling the effect of COVID-19 lockdown on air pollution in Makkah Saudi Arabia with a supervised machine learning approach." Toxics 10.5 (2022): 225.https://doi.org/10.3390/toxics10050225
[15] Zou, Guojian, et al. "Exploring the nonlinear impact of air pollution on housing prices: A machine learning approach." Economics of Transportation 31 (2022): 100272.https://doi.org/10.1016/j.ecotra.2022.100272
[16] Meng, Qingtao, et al. "Prediction of COPD acute exacerbation in response to air pollution using exosomal circRNA profile and Machine learning." Environment international 168 (2022): 107469.https://doi.org/10.1016/j.envint.2022.107469
[17] Abu El-Magd, S., et al. "Environmental hazard assessment and monitoring for air pollution using machine learning and remote sensing." International Journal of Environmental Science and Technology 20.6 (2023): 6103-6116.https://doi.org/10.1007/s13762-022-04367-6
[18] Bai, Lu, Zhi Liu, and Jianzhou Wang. "Novel hybrid extreme learning machine and multi-objective optimization algorithm for air pollution prediction." Applied Mathematical Modelling 106 (2022): 177-198.https://doi.org/10.1016/j.apm.2022.01.023
[19] Taheri, Saman, and Ali Razban. "Learning-based CO2 concentration prediction: Application to indoor air quality control using demand-controlled ventilation." Building and Environment 205 (2021): 108164.https://doi.org/10.1016/j.buildenv.2021.108164
[20] Das, Abhishek. "A hybrid deep learning model for air quality time series prediction." Indonesian Journal of Electrical Engineering and Computer Science (2021).
[21] Gupta, N. Srinivasa, et al. "Prediction of air quality index using machine learning techniques: a comparative analysis." Journal of Environmental and Public Health 2023.1 (2023): 4916267.https://doi.org/10.1155/2023/4916267
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Aesha Bhardwaj, Sanjay Silakar, Rajeev Pandey, Jashwant Samar

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Re-users must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use. This license allows for redistribution, commercial and non-commercial, as long as the original work is properly credited.