Comparison of Data Augmentation Techniques for pm2.5 Time Series Prediction

Authors

  • Anibal Flores, Hugo Tito-Chura, Deymor Centty-Villafuerte, Alejandro Ecos-Espino

Keywords:

pm2.5 prediction; data augmentation; time series; deep learning

Abstract

Pm2.5 time series prediction is a very important task; in the literature, many approaches have been implemented, but just two use data augmentation to improve results. In this work, three data augmentation techniques are implemented and analyzed, two of them typical of the state of the art, thus the first (DA1) is time-warping + jittering, the second (DA2) is linear interpolation, and the third DA3 is polynomial interpolation which is proposed in this work. The performance of the data augmentation techniques is evaluated through four deep learning techniques including Long Short-Term Memory (LSTM), Bidirectional LSTM (BiLSTM), Gated Recurrent Unit (GRU), and Bidirectional GRU (BiGRU). In terms of RMSE, MAPE, and R2, the results show that DA2 and DA3 are superior to DA1, between 3.62% to 4.04% respectively. While DA2 and DA3 present similar performances, the main difference between them is the higher computational cost of DA3 concerning DA2.

Downloads

Download data is not yet available.

References

WHO, “Air pollution.” https://www.who.int/health-topics/air-pollution#tab=tab_1 (accessed Jun. 10, 2023).

C. C. Cho, W. Y. Hsieh, C. H. Tsai, C. Y. Chen, H. F. Chang, and C. S. Lin, “In vitro and in vivo experimental studies of PM2.5 on disease progression,” International Journal of Environmental Research and Public Health, vol. 15, no. 7. 2018, doi: 10.3390/ijerph15071380.

Y. F. Xing, Y. H. Xu, M. H. Shi, and Y. X. Lian, “The impact of PM2.5 on the human respiratory system,” Journal of Thoracic Disease, vol. 8, no. 1. 2016, doi: 10.3978/j.issn.2072-1439.2016.01.19.

T. Li et al., “PM2.5 exposure associated with microbiota gut-brain axis: Multi-omics mechanistic implications from the BAPE study,” Innovation, vol. 3, no. 2, 2022, doi: 10.1016/j.xinn.2022.100213.

Z. Xue et al., “ITRAQ based proteomic analysis of PM2.5 induced lung damage,” RSC Adv., vol. 9, no. 21, 2019, doi: 10.1039/c9ra00252a.

M. A. H. Suryadhi et al., “Exposure to particulate matter (PM2.5) and prevalence of diabetes mellitus in Indonesia,” Environ. Int., vol. 140, 2020, doi: 10.1016/j.envint.2020.105603.

Y. Hao, Y. Zhou, J. Gao, and J. Wang, “A Novel Air Pollutant Concentration Prediction System Based on Decomposition-Ensemble Mode and Multi-Objective Optimization for Environmental System Management,” Systems, vol. 10, no. 5, 2022, doi: 10.3390/systems10050139.

I. Nusrat and S. B. Jang, “A comparison of regularization techniques in deep neural networks,” Symmetry (Basel)., vol. 10, no. 11, 2018, doi: 10.3390/sym10110648.

J. Sijia et al., “Spatial prediction using random forest spatial interpolation with sample augmentation: a case study for precipitation mapping,” Earth Sci. Informatics, vol. 16, no. 1, 2023, doi: 10.1007/s12145-023-00936-6.

C. Shorten, T. M. Khoshgoftaar, and B. Furht, “Text Data Augmentation for Deep Learning,” J. Big Data, vol. 8, no. 1, 2021, doi: 10.1186/s40537-021-00492-0.

R. B. Arantes, G. Vogiatzis, and D. R. Faria, “Learning an augmentation strategy for sparse datasets,” Image Vis. Comput., vol. 117, 2022, doi: 10.1016/j.imavis.2021.104338.

L. Zhang, P. Liu, L. Zhao, G. Wang, W. Zhang, and J. Liu, “Air quality predictions with a semi-supervised bidirectional LSTM neural network,” Atmos. Pollut. Res., vol. 12, no. 1, 2021, doi: 10.1016/j.apr.2020.09.003.

Q. Liu, Y. Zou, and X. Liu, “A self-organizing memory neural network for aerosol concentration prediction,” C. - Comput. Model. Eng. Sci., vol. 119, no. 3, 2019, doi: 10.32604/cmes.2019.06272.

U. Pak et al., “Deep learning-based PM2.5 prediction considering the spatiotemporal correlations: A case study of Beijing, China,” Sci. Total Environ., vol. 699, 2020, doi: 10.1016/j.scitotenv.2019.07.367.

Y. Zhang and W. Li, “SSA-LSTM neural network for hourly PM2.5 concentration prediction in Shenyang, China,” in Journal of Physics: Conference Series, 2021, vol. 1780, no. 1, doi: 10.1088/1742-6596/1780/1/012015.

W. Wang, W. Mao, X. Tong, and G. Xu, “A novel recursive model based on a convolutional long short-term memory neural network for air pollution prediction,” Remote Sens., vol. 13, no. 7, 2021, doi: 10.3390/rs13071284.

M. Zhu and J. Xie, “Investigation of nearby monitoring station for hourly PM2.5 forecasting using parallel multi-input 1D-CNN-biLSTM,” Expert Syst. Appl., vol. 211, 2023, doi: 10.1016/j.eswa.2022.118707.

L. Zhang, L. Xu, M. Jiang, and P. He, “A novel hybrid ensemble model for hourly PM2.5 concentration forecasting,” Int. J. Environ. Sci. Technol., vol. 20, no. 1, 2023, doi: 10.1007/s13762-022-03940-3.

M. H. Nguyen, P. Le Nguyen, K. Nguyen, V. A. Le, T. H. Nguyen, and Y. Ji, “PM2.5 Prediction Using Genetic Algorithm-Based Feature Selection and Encoder-Decoder Model,” IEEE Access, vol. 9, 2021, doi: 10.1109/ACCESS.2021.3072280.

A. Flores, J. Valeriano-Zapana, V. Yana-Mamani, and H. Tito-chura, “PM2.5 prediction with Recurrent Neural Networks and Data Augmentation,” 2021, doi: 10.1109/LA-CCI48322.2021.9769784.

G. Zheng, H. Liu, C. Yu, Y. Li, and Z. Cao, “A new PM2.5 forecasting model based on data preprocessing, reinforcement learning and gated recurrent unit network,” Atmos. Pollut. Res., vol. 13, no. 7, 2022, doi: 10.1016/j.apr.2022.101475.

X. Li, A. Luo, J. Li, and Y. Li, “Air Pollutant Concentration Forecast Based on Support Vector Regression and Quantum-Behaved Particle Swarm Optimization,” Environ. Model. Assess., vol. 24, no. 2, 2019, doi: 10.1007/s10666-018-9633-3.

J. Chu, Y. Dong, X. Han, J. Xie, X. Xu, and G. Xie, “Short-term prediction of urban PM2.5 based on a hybrid modified variational mode decomposition and support vector regression model,” Environmental Science and Pollution Research, vol. 28, no. 1. 2021, doi: 10.1007/s11356-020-11065-8.

J. Li et al., “Estimation of ambient PM2.5 in Iraq and Kuwait from 2001 to 2018 using machine learning and remote sensing,” Environ. Int., vol. 151, 2021, doi: 10.1016/j.envint.2021.106445.

Y. C. Chen, T. C. Lei, S. Yao, and H. P. Wang, “Pm2.5 prediction model based on combinational hammerstein recurrent neural networks,” Mathematics, vol. 8, no. 12, 2020, doi: 10.3390/math8122178.

J. Li, X. Li, K. Wang, and G. Cui, “Atmospheric pm2.5 prediction based on multiple model adaptive unscented kalman filter,” Atmosphere (Basel)., vol. 12, no. 5, 2021, doi: 10.3390/atmos12050607.

H. Guo, Y. Guo, W. Zhang, X. He, and Z. Qu, “Research on a novel hybrid decomposition–ensemble learning paradigm based on VMD and IWOA for PM2.5 forecasting,” Int. J. Environ. Res. Public Health, vol. 18, no. 3, 2021, doi: 10.3390/ijerph18031024.

P. Shi, X. Fang, J. Ni, and J. Zhu, “An improved attention-based integrated deep neural network for pm2.5 concentration prediction,” Appl. Sci., vol. 11, no. 9, 2021, doi: 10.3390/app11094001.

H. Xing, G. Wang, C. Liu, and M. Suo, “PM2.5 concentration modeling and prediction by using temperature-based deep belief network,” Neural Networks, vol. 133, 2021, doi: 10.1016/j.neunet.2020.10.013.

S. Yin, T. Li, X. Cheng, and J. Wu, “Remote sensing estimation of surface PM2.5 concentrations using a deep learning model improved by data augmentation and a particle size constraint,” Atmos. Environ., vol. 287, 2022, doi: 10.1016/j.atmosenv.2022.119282.

S. Moritz and T. Bartz-Beielstein, “imputeTS: Time series missing value imputation in R,” R J., vol. 9, no. 1, 2017, doi: 10.32614/rj-2017-009.

S. Moritz, “imputeTS,” 2021.

A. Flores, H. Tito-Chura, and H. Apaza-Alanoca, “Data Augmentation for Short-Term Time Series Prediction with Deep Learning,” in Lecture Notes in Networks and Systems, 2021, vol. 284, doi: 10.1007/978-3-030-80126-7_36.

Z. Tian and M. Gai, “New PM2.5 forecasting system based on combined neural network and an improved multi-objective optimization algorithm: Taking the economic belt surrounding the Bohai Sea as an example,” J. Clean. Prod., vol. 375, 2022, doi: 10.1016/j.jclepro.2022.134048.

Downloads

Published

24.03.2024

How to Cite

Hugo Tito-Chura, Deymor Centty-Villafuerte, Alejandro Ecos-Espino, A. F. . (2024). Comparison of Data Augmentation Techniques for pm2.5 Time Series Prediction. International Journal of Intelligent Systems and Applications in Engineering, 12(3), 2312–2319. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/5699

Issue

Section

Research Article