Abstract
Weather prediction in tropical Indonesia faces complex challenges due to high climate variability, persistent El Niño–Southern Oscillation (ENSO) influence, and uneven observational coverage. This study compared five machine learning algorithms — Random Forest (RF), Support Vector Machine (SVM), Long Short-Term Memory (LSTM), XGBoost, and LightGBM — using 187,320 daily records from BMKG stations, ERA5 reanalysis, and TRMM satellite data (2000–2023). Preprocessing included MMDIF-RF imputation, Z-score normalization, and SMOTE for class imbalance correction. Models were evaluated on RMSE, MAE, R², Accuracy, Precision, Recall, F1-Score, and AUC-ROC. LSTM achieved the best performance (RMSE = 3.94 mm; R² = 0.891; F1-Score = 0.887; AUC-ROC = 0.941), reflecting its capacity to capture long-range temporal dependencies. XGBoost and LightGBM delivered competitive accuracy at 8–18 times lower training cost, while SVM recorded the lowest accuracy with the highest computational demand. Regional analysis showed station density and data completeness were more consequential than algorithm choice — LSTM RMSE ranged from 3.61 mm in West Java to 5.43 mm in East Nusa Tenggara. A tiered hybrid approach is recommended: LightGBM or XGBoost for routine forecasting and LSTM for extreme event detection, alongside expanded BMKG coverage in eastern Indonesia.
References
-
AbdulRaheem, M., Awotunde, J. B., Abidemi, E. A., Idowu, D. O., & Adekola, S. O. (2022). Weather prediction performance evaluation on selected machine learning algorithms. IAES International Journal of Artificial Intelligence, 11(4), 1535. https://doi.org/10.11591/ijai.v11.i4
-
Adnan, A., Yolanda, A. M., & Natasya, F. (2021, October). A comparison of bagging and boosting on classification data: Case study on rainfall data in Sultan Syarif Kasim II meteorological station in Pekanbaru. Journal of Physics: Conference Series, 2049(1), 012053. https://doi.org/10.1088/1742-6596/2049/1/012053
-
Amini, A., Dolatshahi, M., & Kerachian, R. (2023). Effects of automatic hyperparameter tuning on the performance of multi-variate deep learning-based rainfall nowcasting. Water Resources Research, 59(1), e2022WR032789. https://doi.org/10.1029/2022WR032789
-
Amnuaylojaroen, T. (2023). Advancements in downscaling global climate model temperature data in southeast asia: A machine learning approach. Forecasting, 6(1), 1–17. https://doi.org/10.3390/forecast6010001
-
Auliya, M. N., Saputra, A. H., Kristianto, A., & Qomariyatuzzamzami, L. N. (2023, November). Predicting hailstorms through machine learning approach using multiple source data analysis. In International Conference on Radioscience, Equatorial Atmospheric Science and Environment (pp. 225–236). Springer Nature Singapore. https://doi.org/10.1007/978-981-97-0740-9_21
-
Barrera-Animas, A. Y., Oyedele, L. O., Bilal, M., Akinosho, T. D., Delgado, J. M. D., & Akanbi, L. A. (2022). Rainfall prediction: A comparative analysis of modern machine learning algorithms for time-series forecasting. Machine Learning with Applications, 7, 100204. https://doi.org/10.1016/j.mlwa.2021.100204
-
Chen, L., Han, B., Wang, X., Zhao, J., Yang, W., & Yang, Z. (2023). Machine learning methods in weather and climate applications: A survey. Applied Sciences, 13(21), 12019. https://doi.org/10.3390/app132112019
-
Chen, R., Zhang, W., & Wang, X. (2020). Machine learning in tropical cyclone forecast modeling: A review. Atmosphere, 11(7), 676. https://doi.org/10.3390/atmos11070676
-
Chen, Y., Teo, F. Y., Wong, S. Y., Chan, A., Weng, C., & Falconer, R. A. (2024). Monsoonal extreme rainfall in Southeast Asia: A review. Water, 17(1), 5. https://doi.org/10.3390/w17010005
-
Chitwatkulsiri, D., & Miyamoto, H. (2023). Real-time urban flood forecasting systems for southeast asia — A review of present modelling and its future prospects. Water, 15(1), 178. https://doi.org/10.3390/w15010178
-
de Burgh-Day, C. O., & Leeuwenburg, T. (2023). Machine learning for numerical weather and climate modelling: A review. Geoscientific Model Development, 16(22), 6433–6477. https://doi.org/10.5194/gmd-16-6433-2023
-
Dueben, P. D., Schultz, M. G., Chantry, M., Gagne, D. J., Hall, D. M., & McGovern, A. (2022). Challenges and benchmark datasets for machine learning in the atmospheric sciences: Definition, status, and outlook. Artificial Intelligence for the Earth Systems, 1(3), e210002. https://doi.org/10.1175/AIES-D-21-0002.1
-
Gong, Y., Zhang, Y., Wang, F., & Lee, C. H. (2024). Deep learning for weather forecasting: A CNN-LSTM hybrid model for predicting historical temperature data. arXiv preprint arXiv:2410.14963. https://doi.org/10.48550/arXiv.2410.14963
-
Hasan, M. M., Hasan, M. J., & Rahman, P. B. (2024). Comparison of RNN-LSTM, TFDF and stacking model approach for weather forecasting in Bangladesh using historical data from 1963 to 2022. PLOS ONE, 19(9), e0310446. https://doi.org/10.1371/journal.pone.0310446
-
Hridoy, M. A. A. M., Shawkat, A. I., Bordin, C., Acharjee, M. R., Masood, A., Baki, A. O., & Al Mamun, M. A. (2025). Advanced machine learning models for accurate water quality classification and WQI prediction: Implications for aquatic disease risk management. Science of the Total Environment, 1008, 180965. https://doi.org/10.1016/j.scitotenv.2025.180965
-
Inuwa, S. S., Dimyati, M., Masita, D. M. M., & Hafid, S. (2025). Spatio-temporal dynamics of precipitation anomalies in Southeast Asia: ENSO influence and machine learning-based prediction. The 46th Asian Conference on Remote Sensing, 1–25.
-
Irmanda, H. N., Ermatita, E., bin Awang, M. K., & Adrezo, M. (2024). Enhancing weather prediction models through the application of random forest method and chi-square feature selection. JOIV: International Journal on Informatics Visualization, 8(3-2), 1506–1514. https://doi.org/10.62527/joiv.8.3-2.2356
-
Jisha, G. (2024, August). Enhanced weather prediction with feature engineered, time series cross validated ridge regression model. In 2024 Control Instrumentation System Conference (CISCON) (pp. 1–6). IEEE. https://doi.org/10.1109/CISCON62171.2024.10696530
-
Joy, U. G., Kabir, S., & Niger, T. (2025). Attention-enhanced LSTM modeling for improved temperature and rainfall forecasting in Bangladesh. Theoretical and Applied Climatology, 156(11), 613. https://doi.org/10.1007/s00704-025-05858-5
-
Kahfi, S., Wiharjo, S., & Rivai, A. K. (2025). Optimization of employee burnout prediction using explainable boosting machine, long short-term memory, and extreme gradient boosting methods in human resource management at PT. XYZ. International Journal Software Engineering and Computer Science (IJSECS), 5(3), 1083–1094. https://doi.org/10.35870/ijsecs.v5i3.5772
-
Kreuzer, D., Munz, M., & Schlüter, S. (2020). Short-term temperature forecasts using a convolutional neural network — An application to different weather stations in Germany. Machine Learning with Applications, 2, 100007. https://doi.org/10.1016/j.mlwa.2020.100007
-
Latif, S. D., Hazrin, N. A. B., Koo, C. H., Ng, J. L., Chaplot, B., Huang, Y. F., & Ahmed, A. N. (2023). Assessing rainfall prediction models: Exploring the advantages of machine learning and remote sensing approaches. Alexandria Engineering Journal, 82, 16–25. https://doi.org/10.1016/j.aej.2023.09.060
-
Lee, J. C. K., Zhang, H., Barker, D. M., Chen, S., Kumar, R., An, B. W., & Chandramouli, K. (2022). Weather prediction for Singapore — Progress, challenges, and opportunities. Meteorology, 1(4), 394–401. https://doi.org/10.3390/meteorology1040025
-
Li, C., Ren, X., & Zhao, G. (2023). Machine-learning-based imputation method for filling missing values in ground meteorological observation data. Algorithms, 16(9), 422. https://doi.org/10.3390/a16090422
-
Liu, Z., Yang, Q., Shao, J., Wang, G., Liu, H., Tang, X., & Bai, L. (2022). Improving daily precipitation estimation in the data scarce area by merging rain gauge and TRMM data with a transfer learning framework. Journal of Hydrology, 613, 128455. https://doi.org/10.1016/j.jhydrol.2022.128455
-
Lyu, Y., & Yong, B. (2024). A novel double machine learning strategy for producing high-precision multi-source merging precipitation estimates over the Tibetan Plateau. Water Resources Research, 60(4), e2023WR035643. https://doi.org/10.1029/2023WR035643
-
Majeed, M. A., Shafri, H. Z., Zulkafli, Z., & Wayayok, A. (2025). Dengue fever prediction using LSTM and integrated temporal-spatial attention: A case study of Malaysia. Spatial Information Research, 33(1), 5. https://doi.org/10.1007/s41324-025-00603-6
-
Mardyansyah, R. Y., Kurniawan, B., Soekirno, S., Nuryanto, D. E., & Satria, H. (2022, December). Artificial intelligent for rainfall estimation in tropical region: A survey. IOP Conference Series: Earth and Environmental Science, 1105(1), 012024. https://doi.org/10.1088/1755-1315/1105/1/012024
-
Markovics, D., & Mayer, M. J. (2022). Comparison of machine learning methods for photovoltaic power forecasting based on numerical weather prediction. Renewable and Sustainable Energy Reviews, 161, 112364. https://doi.org/10.1016/j.rser.2022.112364
-
Mishra, P., Ray, S., Lal, P., Nair, S. B., Matuka, A., Tashkandy, Y., & Emam, W. (2025). Climate modeling for South Asia: Statistical and deep learning for rainfall and temperature prediction. Scientific Reports, 15(1), 38235. https://doi.org/10.1038/s41598-025-22149-1
-
Mohanty, Y. (2025, March). Predicting droughts: A comparative study of ARIMAX, LSTM, XGBoost, and random forest models. In 2025 11th International Conference on Computing and Artificial Intelligence (ICCAI) (pp. 772–787). IEEE. https://doi.org/10.1109/ICCAI66501.2025.00121
-
Nguyen, T., Jewik, J., Bansal, H., Sharma, P., & Grover, A. (2023). Climatelearn: Benchmarking machine learning for weather and climate modeling. Advances in Neural Information Processing Systems, 36, 75009–75025.
-
Parra-Plazas, J., Gaona-Garcia, P., & Plazas-Nossa, L. (2023). Time series outlier removal and imputing methods based on Colombian weather stations data. Environmental Science and Pollution Research, 30(28), 72319–72335. https://doi.org/10.1007/s11356-023-27176-x
-
Pringandana, C. G. L., & Kusnawi, K. (2025). A comparative analysis of hyperparameter-tuned XGBoost and LightGBM for multiclass rainfall classification in Jakarta. Jurnal Teknik Informatika (JUTIF), 6(4), 2467–2483. https://doi.org/10.52436/1.jutif.2025.6.4.4965
-
Putra, A. F. D., Azmi, M. N., Wijayanto, H., Utama, S., & Wirawan, I. G. P. W. W. (2024). Optimizing rain prediction model using random forest and grid search cross-validation for agriculture sector. MATRIK: Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer, 23(3), 519–530. https://doi.org/10.30812/matrik.v23i3.3891
-
Putra, M., Rosid, M. S., & Handoko, D. (2024). A review of rainfall estimation in Indonesia: Data sources, techniques, and methods. Signals, 5(3), 542–561. https://doi.org/10.3390/signals5030030
-
Rasp, S., Dueben, P. D., Scher, S., Weyn, J. A., Mouatadid, S., & Thuerey, N. (2020). WeatherBench: A benchmark data set for data-driven weather forecasting. Journal of Advances in Modeling Earth Systems, 12(11), e2020MS002203. https://doi.org/10.1029/2020MS002203
-
Suacana, I. W. G., Suhariyanto, D., & Nuru, F. (2024). Optimizing the 2024 governor election quick count with extreme gradient boosting (XGBoost) to increase voting prediction accuracy. International Journal Software Engineering and Computer Science (IJSECS), 4(1), 91–106. https://doi.org/10.35870/ijsecs.v4i1.2286
-
Sun, L., & Fu, Y. (2021). A new merged dataset for analyzing clouds, precipitation and atmospheric parameters based on ERA5 reanalysis data and the measurements of the Tropical Rainfall Measuring Mission (TRMM) precipitation radar and visible and infrared scanner. Earth System Science Data, 13(5), 2293–2306. https://doi.org/10.5194/essd-13-2293-2021
-
Tırınk, S. (2025). Machine learning-based forecasting of air quality index under long-term environmental patterns: A comparative approach with XGBoost, LightGBM, and SVM. PLOS ONE, 20(10), e0334252. https://doi.org/10.1371/journal.pone.0334252
-
Toharudin, T., Caraka, R. E., Pratiwi, I. R., Kim, Y., Gio, P. U., Sakti, A. D., & Pardamean, B. (2023). Boosting algorithm to handle unbalanced classification of PM2.5 concentration levels by observing meteorological parameters in Jakarta-Indonesia using AdaBoost, XGBoost, CatBoost, and LightGBM. IEEE Access, 11, 35680–35696. https://doi.org/10.1109/ACCESS.2023.3265019
-
V, R. C., Johnvictor, A. C., & N, P. S. (2025). Comparative analysis of machine learning approaches for heatwave event prediction in India. Scientific Reports, 15, 22431. https://doi.org/10.1038/s41598-025-04634-9
-
Waqas, M., Humphries, U. W., Wangwongchai, A., Dechpichai, P., & Ahmad, S. (2023). Potential of artificial intelligence-based techniques for rainfall forecasting in Thailand: A comprehensive review. Water, 15(16), 2979. https://doi.org/10.3390/w15162979
-
Yu, P. S., Yang, T. C., Chen, S. Y., Kuo, C. M., & Tseng, H. W. (2017). Comparison of random forests and support vector machine for real-time radar-derived rainfall forecasting. Journal of Hydrology, 552, 92–104. https://doi.org/10.1016/j.jhydrol.2017.06.020
-
Zhuang, H., Lehner, F., & DeGaetano, A. T. (2024). Improved diagnosis of precipitation type with LightGBM machine learning. Journal of Applied Meteorology and Climatology, 63(3), 437–453. https://doi.org/10.1175/JAMC-D-23-0117.1
Author Biographies
Nofirman Nofirman
Universitas Prof Dr Hazairin SH
Universitas Prof Dr Hazairin SH, Kota Bengkulu, Provinsi Bengkulu, Indonesia
Munawir Munawir
Universitas Bali Internasional Muhammadiyah Bali
Universitas Bali Internasional Muhammadiyah Bali, Kota Denpasar, Provinsi Bali, Indonesia
Fegie Yoanti Wattimena
Universitas Ottow Geissler
Universitas Ottow Geissler, Kota Jayapura, Provinsi Papua, Indonesia