Unmasking the sky: high-resolution PM<sub>2.5</sub> prediction in Texas using machine learning techniques.
Date
2024-04
Journal Title
Journal ISSN
Volume Title
Repository Usage Stats
views
downloads
Citation Stats
Abstract
Background
Although PM2.5 (fine particulate matter with an aerodynamic diameter less than 2.5 µm) is an air pollutant of great concern in Texas, limited regulatory monitors pose a significant challenge for decision-making and environmental studies.Objective
This study aimed to predict PM2.5 concentrations at a fine spatial scale on a daily basis by using novel machine learning approaches and incorporating satellite-derived Aerosol Optical Depth (AOD) and a variety of weather and land use variables.Methods
We compiled a comprehensive dataset in Texas from 2013 to 2017, including ground-level PM2.5 concentrations from regulatory monitors; AOD values at 1-km resolution based on images retrieved from the MODIS satellite; and weather, land-use, population density, among others. We built predictive models for each year separately to estimate PM2.5 concentrations using two machine learning approaches called gradient boosted trees and random forest. We evaluated the model prediction performance using in-sample and out-of-sample validations.Results
Our predictive models demonstrate excellent in-sample model performance, as indicated by high R2 values generated from the gradient boosting models (0.94-0.97) and random forest models (0.81-0.90). However, the out-of-sample R2 values fall within a range of 0.52-0.75 for gradient boosting models and 0.44-0.69 for random forest models. Model performance varies slightly across years. A generally decreasing trend in predicted PM2.5 concentrations over time is observed in Eastern Texas.Impact statement
We utilized machine learning approaches to predict PM2.5 levels in Texas. Both gradient boosting and random forest models perform well. Gradient boosting models perform slightly better than random forest models. Our models showed excellent in-sample prediction performance (R2 > 0.9).Type
Department
Description
Provenance
Citation
Permalink
Published Version (Please cite this version)
Publication Info
Zhang, Kai, Jeffrey Lin, Yuanfei Li, Yue Sun, Weitian Tong, Fangyu Li, Lung-Chang Chien, Yiping Yang, et al. (2024). Unmasking the sky: high-resolution PM2.5 prediction in Texas using machine learning techniques. Journal of exposure science & environmental epidemiology. 10.1038/s41370-024-00659-w Retrieved from https://hdl.handle.net/10161/30657.
This is constructed from limited available data and may be imprecise. To cite this article, please review & use the official citation provided by the journal.
Collections
Scholars@Duke

Sheng Luo
Unless otherwise indicated, scholarly articles published by Duke faculty members are made available here with a CC-BY-NC (Creative Commons Attribution Non-Commercial) license, as enabled by the Duke Open Access Policy. If you wish to use the materials in ways not already permitted under CC-BY-NC, please consult the copyright owner. Other materials are made available here through the author’s grant of a non-exclusive license to make their work openly accessible.