Research Article | | Peer-Reviewed

Exploring a Novel Approach of K-mean Gradient Boosting Algorithm with PCA for Drought Prediction

Received: 11 June 2024     Accepted: 8 July 2024     Published: 23 July 2024
Views:       Downloads:
Abstract

Drought poses a significant threat to essential resources like food, land, and public health. Machine Learning (ML) has emerged as a powerful tool in weather forecasting, leveraging algorithms to predict weather phenomena with remarkable accuracy. ML models excel in navigating complex atmospheric systems, including those affected by climate change, offering precision beyond traditional forecasting methods. However, predicting drought remains challenging due to its uneven distribution and varying degrees. To tackle this challenge, an exploration of a novel approach of combining K-means++ clustering and Gradient Boosting Algorithm (KGBA) with Principal Component Analysis (PCA) for dimensionality reduction was carried out. Using a dataset spanning from 2000 to July 2016, comprising 2,756,796 US Drought Monitor records, the study developed and evaluated the KGBA model's effectiveness in drought prediction. The results demonstrated the superiority of high precision and recall rates, particularly in forecasting extreme and exceptional drought periods. Specifically, KGBA attained precision accuracies of 33% and 74%, along with recall rates of 72% and 77% for predicting extreme and exceptional drought periods, respectively. The model had an overall accuracy of 46% in predicting all the multiple classes of droughts. A performance that is slightly better than other ensemble methods that had the closest performance. These findings underscore the potential of KGBA in enhancing the predictive capabilities for drought mitigation efforts, as it outperformed other models such as Gradient Boosting, Random Forest, Bayes Naive, and K-Nearest Neighbor.

Published in American Journal of Data Mining and Knowledge Discovery (Volume 9, Issue 1)
DOI 10.11648/j.ajdmkd.20240901.11
Page(s) 1-19
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2024. Published by Science Publishing Group

Keywords

K-means++, Gradient Boosting, Drought, Principal Component Analysis, Machine Learning and Climate Change

References
[1] NOS Science Report 2021
[2] Mortuza, M. R., Moges, E., Demissie, Y., & Li, H. Y. (2019). Historical and future drought in Bangladesh using copula-based bivariate regional frequency analysis. Theoretical and Applied Climatology, 135(3–4), 855–871.
[3] Khan, N., Sachindra, D. A., Shahid, S., Ahmed, K., Shiru, M. S., & Nawaz, N. (2020). Prediction of droughts over Pakistan using machine learning algorithms. Advances in Water Resources, 139.
[4] Barua, S., Ng, A. W. M., & Perera, B. J. C. (2012). Artificial Neural Network–Based Drought Forecasting Using a Nonlinear Aggregated Drought Index. Journal of Hydrologic Engineering, 17(12), 1408–1413.
[5] Ghimire, S., Deo, R. C., Downs, N. J., & Raj, N. (2019). Global solar radiation prediction by ANN integrated with European Centre for medium range weather forecast fields in solar rich cities of Queensland Australia. Journal of Cleaner Production, 216, 288–310.
[6] Xiang, B., Lin, S. J., Zhao, M., Johnson, N. C., Yang, X., & Jiang, X. (2019). Subseasonal Week 3–5 Surface Air Temperature Prediction During Boreal Wintertime in a GFDL Model. Geophysical Research Letters, 46(1), 416–425.
[7] Yang, T., Zhou, X., Yu, Z., Krysanova, V., & Wang, B. (2015). Drought projection based on a hybrid drought index using Artificial Neural Networks. Hydrological Processes, 29(11), 2635–2648.
[8] Jolliffe, I. T. (2002). Principal component analysis for special types of data (pp. 338-372). Springer, New York.
[9] Sidak, K. (2023, December). Overview of Principal Component Analysis (PCA).
[10] Mokhtar, A., Jalali, M., He, H., Al-Ansari, N., Elbeltagi, A., Alsafadi, K., Abdo, H. G., Sammen, S. S., Gyasi-Agyei, Y., & Rodrigo-Comino, J. (2021). Estimation of SPEI Meteorological Drought Using Machine Learning Algorithms. IEEE Access, 9, 65503–65523.
[11] Jiang, W., & Luo, J. (2021). An Evaluation of Machine Learning and Deep Learning Models for Drought Prediction using Weather Data.
[12] Gan, T. Y., Ito, M., Hülsmann, S., Qin, X., Lu, X. X., Liong, S. Y., Rutschman, P., Disse, M., & Koivusalo, H. (2016). Possible climate change/variability and human impacts, vulnerability of drought-prone regions, water resources and capacity building for Africa. Hydrological Sciences Journal, 61(7), 1209–1226.
[13] Ayinla, B., & Akinola, S. O. (2021). An Improved Collaborative Pruning Using Ant Colony Optimization and Pessimistic Technique of C5.0 Decision Tree Algorithm. Article in International Journal of Computer Science and Information Security.
[14] Zhong, R., Chen, X., Lai, C., Wang, Z., Lian, Y., Yu, H., & Wu, X. (2019). Drought monitoring utility of satellite-based precipitation products across mainland China. Journal of Hydrology, 568, 343–359.
[15] Breiman, L. (1997). ARCING THE EDGE.
[16] Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5), 1189–1232.
[17] Friedman, J. H. (2002). Stochastic gradient boosting. Computational Statistics and Data Analysis, 38(4), 367–378.
[18] Mason, L., Bartlett, P., Baxter, J., & Frean, M. (2000). Boosting Algorithm as Gradient Descent. Advances in Neural Information Processing Systems, 512–518.
[19] Bentéjac, C., Csörgő, A., & Martínez-Muñoz, G. (2021). A Comparative Analysis of XGBoost. Artificial Intelligence Review, 54, 1937–1967.
[20] Friedman, J., Hastie, T., & Tibshirani, R. (2000). ADDITIVE LOGISTIC REGRESSION: A STATISTICAL VIEW OF BOOSTING. In The Annals of Statistics (Vol. 28, Issue 2).
[21] Breiman, L. (2001). Random forests. Kluwer Academic Publishers, Netherlands 45(1), 5–32.
[22] Luo, H., Bhardwaj, J., Choy, S., & Kuleshov, Y. (2022). Applying Machine Learning for Threshold Selection in Drought Early Warning System. Climate, 10(7).
[23] Felsche, E., & Ludwig, R. (n.d.). Applying machine learning for drought prediction using data from a large ensemble of climate simulations.
[24] Likas, A., Vlassis, N., & Verbeek, J. (n.d.). The global k-means clustering algorithm The global k-means clustering algorithm. [Technical.
[25] Tri, D. Q., Dat, T. T., & Truong, D. D. (2019). Application of meteorological and hydrological drought indices to establish drought classification maps of the Ba River basin in Vietnam. Hydrology, 6(2).
[26] Christoph, M. (2021, July 23). Predict Droughts using Weather & Soil Data.
[27] Nitin. (2020, April 22). LightGBM Binary Classification, Multi-Class Classification, Regression using Python.
[28] Amber, T., & US, D. M. (2021). amberthomas/us-drought-monitor | Workspace | data. world.
Cite This Article
  • APA Style

    Ayinla, B. I., Abdulsalam, R. A. (2024). Exploring a Novel Approach of K-mean Gradient Boosting Algorithm with PCA for Drought Prediction. American Journal of Data Mining and Knowledge Discovery, 9(1), 1-19. https://doi.org/10.11648/j.ajdmkd.20240901.11

    Copy | Download

    ACS Style

    Ayinla, B. I.; Abdulsalam, R. A. Exploring a Novel Approach of K-mean Gradient Boosting Algorithm with PCA for Drought Prediction. Am. J. Data Min. Knowl. Discov. 2024, 9(1), 1-19. doi: 10.11648/j.ajdmkd.20240901.11

    Copy | Download

    AMA Style

    Ayinla BI, Abdulsalam RA. Exploring a Novel Approach of K-mean Gradient Boosting Algorithm with PCA for Drought Prediction. Am J Data Min Knowl Discov. 2024;9(1):1-19. doi: 10.11648/j.ajdmkd.20240901.11

    Copy | Download

  • @article{10.11648/j.ajdmkd.20240901.11,
      author = {Babatunde Isaiah Ayinla and Rasheedat Aderonke Abdulsalam},
      title = {Exploring a Novel Approach of K-mean Gradient Boosting Algorithm with PCA for Drought Prediction
    },
      journal = {American Journal of Data Mining and Knowledge Discovery},
      volume = {9},
      number = {1},
      pages = {1-19},
      doi = {10.11648/j.ajdmkd.20240901.11},
      url = {https://doi.org/10.11648/j.ajdmkd.20240901.11},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajdmkd.20240901.11},
      abstract = {Drought poses a significant threat to essential resources like food, land, and public health. Machine Learning (ML) has emerged as a powerful tool in weather forecasting, leveraging algorithms to predict weather phenomena with remarkable accuracy. ML models excel in navigating complex atmospheric systems, including those affected by climate change, offering precision beyond traditional forecasting methods. However, predicting drought remains challenging due to its uneven distribution and varying degrees. To tackle this challenge, an exploration of a novel approach of combining K-means++ clustering and Gradient Boosting Algorithm (KGBA) with Principal Component Analysis (PCA) for dimensionality reduction was carried out. Using a dataset spanning from 2000 to July 2016, comprising 2,756,796 US Drought Monitor records, the study developed and evaluated the KGBA model's effectiveness in drought prediction. The results demonstrated the superiority of high precision and recall rates, particularly in forecasting extreme and exceptional drought periods. Specifically, KGBA attained precision accuracies of 33% and 74%, along with recall rates of 72% and 77% for predicting extreme and exceptional drought periods, respectively. The model had an overall accuracy of 46% in predicting all the multiple classes of droughts. A performance that is slightly better than other ensemble methods that had the closest performance. These findings underscore the potential of KGBA in enhancing the predictive capabilities for drought mitigation efforts, as it outperformed other models such as Gradient Boosting, Random Forest, Bayes Naive, and K-Nearest Neighbor.
    },
     year = {2024}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Exploring a Novel Approach of K-mean Gradient Boosting Algorithm with PCA for Drought Prediction
    
    AU  - Babatunde Isaiah Ayinla
    AU  - Rasheedat Aderonke Abdulsalam
    Y1  - 2024/07/23
    PY  - 2024
    N1  - https://doi.org/10.11648/j.ajdmkd.20240901.11
    DO  - 10.11648/j.ajdmkd.20240901.11
    T2  - American Journal of Data Mining and Knowledge Discovery
    JF  - American Journal of Data Mining and Knowledge Discovery
    JO  - American Journal of Data Mining and Knowledge Discovery
    SP  - 1
    EP  - 19
    PB  - Science Publishing Group
    SN  - 2578-7837
    UR  - https://doi.org/10.11648/j.ajdmkd.20240901.11
    AB  - Drought poses a significant threat to essential resources like food, land, and public health. Machine Learning (ML) has emerged as a powerful tool in weather forecasting, leveraging algorithms to predict weather phenomena with remarkable accuracy. ML models excel in navigating complex atmospheric systems, including those affected by climate change, offering precision beyond traditional forecasting methods. However, predicting drought remains challenging due to its uneven distribution and varying degrees. To tackle this challenge, an exploration of a novel approach of combining K-means++ clustering and Gradient Boosting Algorithm (KGBA) with Principal Component Analysis (PCA) for dimensionality reduction was carried out. Using a dataset spanning from 2000 to July 2016, comprising 2,756,796 US Drought Monitor records, the study developed and evaluated the KGBA model's effectiveness in drought prediction. The results demonstrated the superiority of high precision and recall rates, particularly in forecasting extreme and exceptional drought periods. Specifically, KGBA attained precision accuracies of 33% and 74%, along with recall rates of 72% and 77% for predicting extreme and exceptional drought periods, respectively. The model had an overall accuracy of 46% in predicting all the multiple classes of droughts. A performance that is slightly better than other ensemble methods that had the closest performance. These findings underscore the potential of KGBA in enhancing the predictive capabilities for drought mitigation efforts, as it outperformed other models such as Gradient Boosting, Random Forest, Bayes Naive, and K-Nearest Neighbor.
    
    VL  - 9
    IS  - 1
    ER  - 

    Copy | Download

Author Information
  • Sections