Abstract
In West Africa, accurate predictions of temperature are very essential for agriculture, health and energy planning, where climate change and increasing heat pose a high risk. This study develops an open and propagative pipeline for predicting monthly surface temperature anomalies using the ERA 5 Reanalysis inputs and interpretable machine-learning models. The predicting variables include land -atmosphere flux, soil moisture, radiation conditions, circulation fields, and oceanic indices, which are processed into anomalies and lagged features to capture persistence and memory. The results show that machine-learning models continues leading the climatological and persistence baselines, With the strongest gains occurring during transition seasons and over semi-arid regions where land–atmosphere coupling is strong. Interpretive analysis reveals physically relevant relationships: deficits of soil moisture operate positive anomalies through lowered cooling by evaporation; Shortwave radiation and cloud cover modulate surface energy balance; And the lagged anomalies encode land-split memory. Water-borne countries, especially the Gulf of the Guinea SST, contribute during the transitional months, but are secondary to local reactions. Case studies and sensitivity analysis confirm the strength of these mechanisms by identifying coastal gradients and strongly convection periods. The findings suggest that machine learning provides efficient and physically consistent predictions of West African temperature discrepancies, providing practical value for climatic services in agriculture, health and energy fields. The released pipelines and artifacts carry forward the route towards fertility, integration with regional institutions, and integration with dynamic forecasts, operating climate-informed decision support in the region.
|
Published in
|
American Journal of Artificial Intelligence (Volume 9, Issue 2)
|
|
DOI
|
10.11648/j.ajai.20250902.30
|
|
Page(s)
|
310-323 |
|
Creative Commons
|

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.
|
|
Copyright
|
Copyright © The Author(s), 2025. Published by Science Publishing Group
|
Keywords
West Africa, Temperature Anomalies, Machine Learning, Climate Prediction
1. Introduction
West Africa exhibits rich and spatially heterogeneous climate variability governed by the seasonal march of the West African Monsoon, meridional shifts of the intertropical convergence zone, ocean–atmosphere interactions along the Gulf of Guinea, and land–atmosphere coupling across the Sahel and adjoining semi-arid zones
| [1] | S. Diatta and A. H. Fink, “Statistical relationship between remote climate indices and West African monsoon variability,” Int. J. Climatol., vol. 34, no. 12, pp. 3348-3367, Oct. 2014, https://doi.org/10.1002/joc.3912 |
[1]
. Interannual fluctuations in monsoon onset, peak intensity, and withdrawal reshape temperature and moisture regimes from the humid Guinea Coast to the arid southern Sahara. Superimposed on these seasonal cycles are intraseasonal disturbances, including easterly waves and organized mesoscale convective systems, and low-frequency modes of variability related to tropical Atlantic and Pacific sea-surface temperatures
. In recent decades, strong warming trends and an increased frequency of hot days and warm nights have compounded these natural variations, raising the salience of timely temperature information for planning and risk management.
The societal stakes are substantial. Agriculture depends on temperature not only as a direct control on phenology and yield potential but also as a modulator of evapotranspiration and soil moisture stress, particularly during critical stages of cereal crops prevalent in the Sahel and savanna belts
| [3] | R. Fazeli, M. Ruth, and B. Davidsdottir, “Temperature response functions for residential energy demand - A review of models,” Urban Clim., vol. 15, pp. 45-59, Mar. 2016,
https://doi.org/10.1016/j.uclim.2016.01.001 |
[3]
. Health systems face heat-related risks that escalate with persistent warm anomalies, including heightened thermal stress, exacerbation of cardiopulmonary conditions, and indirect effects mediated through water and vector-borne disease dynamics. The energy sector, spanning fast-growing urban centers and expanding rural electrification, experiences temperature-sensitive demand profiles due to cooling needs and the performance of distributed renewable resources. Urban heat amplification further intensifies human exposure, and infrastructure operations including generation, transmission, and storage are sensitive to ambient temperatures and their departures from expected climatological conditions. Across these domains, decision makers benefit from reliable information on temperature anomalies at monthly scales and subregional detail, where many operational triggers and policy thresholds are defined
| [4] | D. E. Parker, “Urban heat island effects on estimates of observed climate change,” WIREs Clim. Change, vol. 1, no. 1, pp. 123-133, 2010, https://doi.org/10.1002/wcc.21 |
[4]
.
Forecast production for this region has advanced through global and regional numerical models, yet coarse spatial resolution, bias structures, and imperfect representation of land–atmosphere feedbacks often limit actionable skill at the scales needed for sectoral decisions. Subseasonal-to-seasonal prediction systems tend to prioritize precipitation and circulation diagnostics, while near-surface temperature, although generally more predictable can still suffer from local errors introduced by terrain smoothing, coastal gradients, and simplified surface schemes
. Downscaling strategies improve spatial detail but can inherit dynamical model biases or require long calibration periods that are difficult to maintain as observing systems evolve. Consequently, there is continued value in methods that leverage the growing body of physically consistent reanalysis products to extract temperature signals and their drivers where traditional approaches underperform or provide limited transparency
.
Reanalysis integrate diverse observations with numerical weather prediction models to yield spatially and temporally coherent fields for key surface and lower-tropospheric variables. Over West Africa, these datasets provide consistent coverage across data-sparse areas and coastal transition zones, and they resolve covariation among thermodynamic, radiative, and soil moisture fields relevant to temperature anomalies. The breadth of available variables enables the construction of predictors that reflect known mechanisms such as soil-moisture–temperature coupling, advection by low-level jets, cloud-radiation effects, and remote forcing from the tropical oceans while maintaining a uniform spatial grid suitable for regional analysis. At monthly scales, reanalysis inputs support both nowcasting and short-lead forecasting through lagged relationships, slowly varying oceanic states, and persistence in land surface conditions.
Machine learning provides a complementary path for translating these multivariate reanalysis fields into temperature anomaly predictions with a focus on empirical skill and model inspection. Tree-based ensembles and gradient boosting methods accommodate nonlinear relationships and interactions among predictors that arise in monsoon transition seasons and coastal–inland gradients
. Generalized additive models and modern glass-box approaches, such as explainable boosting machines, retain interpretability by learning smooth and low-dimensional response functions that can be examined directly. Model-agnostic tools permutation importance, accumulated local effects, and Shapley value analysis facilitate systematic attribution of predicted anomalies to individual features or feature groups, supporting physical plausibility checks and communication with stakeholders who require transparent rationale for forecasts that may influence resource allocation
.
Figure 1. Average Temperature Anomaly by Region.
This study develops an open and reproducible pipeline for predicting monthly near-surface air temperature anomalies across West Africa using reanalysis inputs and interpretable machine-learning models. The pipeline standardizes data acquisition, anomaly computation, feature engineering, validation design, and reporting, thereby enabling consistent application across subregions and time periods. It is organized to separate configuration from computation, allowing rapid experimentation with bounding boxes, variable sets, and lead times while preserving metadata and versioning necessary for scientific reproducibility and future audits. The framework accommodates both lead-0 (same-month) and lead-1 (one-month-ahead) tasks to address operational needs ranging from monitoring to anticipatory planning.
The work emphasizes interpretable models and diagnostics that align with process-level understanding. Baselines are used to set reference skill expectations by calendar month and anomaly persistence, allowing complex models to be judged against transparent alternatives. Interpretability is supported through additive structures, low-order interactions, feature attribution, and response-curve visualization, alongside physical checks on soil-moisture and radiation sensitivities, seasonal conditioning, and spatial coherence with climatic gradients
| [9] | D. Oni, S. Mishra, L. T. Thanh, V. M. Phuc, and Y. Pham, “Detecting Stroke in Human Beings using Machine Learning,” in Health Informatics and Biomedical Engineering Applications, AHFE Open Acces, 2023.
https://doi.org/10.54941/ahfe1003460 |
[9]
. Modeling choices are guided by predictive performance and by the capacity to produce stable, comprehensible explanations consistent with known mechanisms.
Validation uses time-respecting splits that prevent information leakage and reflect operational forecasting with only antecedent data. Expanding-window cross-validation and recent held-out years test generalization under warming and evolving anomaly distributions. Skill is reported with deterministic metrics mean absolute error, root-mean-square error, coefficient of determination and the anomaly correlation coefficient to capture phase agreement. Sensitivity analyses assess stability by season and subregion and track consistency of interpretations across shifting background states. These steps provide evidence for both the reliability and the limits of the approach.
The study delivers an open pipeline for West African temperature anomalies, including data recipes, configuration files, and scripts suitable for adaptation by researchers and practitioners
| [10] | O. Damilola, “CYBER SECURITY AWARENESS IN DEVELOPING COUNTRIES IN AFRICA: LESSONS FROM NIGERIA,” 2024. |
[10]
. It applies interpretable learning to regional climate prediction, balancing accuracy with explanatory clarity, and provides a toolkit for attribution and diagnostics grounded in physical reasoning. It also establishes a validation strategy tailored to time-dependent climate data, generating out-of-sample assessments that reflect forecasting constraints and identify conditions where guidance is robust. Together, these contributions support climate-informed decision making in agriculture, health, and energy systems, while offering a reusable foundation for future work on temperature-related risks in West Africa.
1.1. Related Work
Previous studies on West African temperature variability highlight the influence of monsoon dynamics, land–atmosphere coupling, and ocean–atmosphere interactions, particularly along the Gulf of Guinea. Reanalysis datasets such as ERA5 have been widely used to provide spatially consistent predictors for empirical modeling in data-sparse regions
| [14] | C. M. Taylor, P. P. Harris, and D. J. Parker, “Impact of soil moisture on the development of a Sahelian mesoscale convective system: a case-study from the AMMA Special Observing Period,” Q. J. R. Meteorol. Soc., vol. 136, no. S1, pp. 456-470, 2010, https://doi.org/10.1002/qj.465 |
[14]
. Machine-learning applications in climate prediction show that tree-based ensembles and interpretable additive models can capture nonlinear relationships while still providing physically meaningful explanations. Research focusing on African climates consistently identifies soil-moisture memory, radiative controls, and regional SST anomalies as key drivers of monthly temperature anomalies
| [12] | B. Koné et al., “Influence of initial soil moisture in a regional climate model study over West Africa - Part 1: Impact on the climate mean,” Hydrol. Earth Syst. Sci., vol. 26, no. 3, pp. 711-730, Feb. 2022,
https://doi.org/10.5194/hess-26-711-2022 |
[12]
. Building on these findings, the present study integrates ERA5 predictors, lagged features, and interpretable machine-learning models to improve understanding and prediction of monthly temperature anomalies across West Africa.
1.2. Scope and Contributions
This subsection briefly summarizes the specific objectives of the present study and its novel methodological contributions to interpretable climate prediction in West Africa.
West African climate variability arises from the seasonal march of the West African Monsoon, coastal–inland thermal gradients, and interactions with the tropical Atlantic. Teleconnection patterns especially variability in the eastern equatorial Atlantic known as the Atlantic Niño modulate regional circulation and surface conditions on interannual scales, shaping temperature and rainfall over the Sahel and the Gulf of Guinea
.
Land–atmosphere coupling is a recurring driver of monthly anomalies in the region. Observational and modeling studies report that soil moisture acts as a memory term at the land surface, influencing near-surface temperature via the partitioning of latent and sensible heat fluxes; these effects are strongest in semi-arid belts of the Sahel where evaporative fraction varies widely
| [12] | B. Koné et al., “Influence of initial soil moisture in a regional climate model study over West Africa - Part 1: Impact on the climate mean,” Hydrol. Earth Syst. Sci., vol. 26, no. 3, pp. 711-730, Feb. 2022,
https://doi.org/10.5194/hess-26-711-2022 |
[12]
. Evidence from process and regime-based evaluations over Africa further indicates seasonally dependent coupling “hot spots,” supporting the use of soil-moisture-related predictors in statistical models of temperature anomalies
| [13] | C. O. de Burgh-Day and T. Leeuwenburg, “Machine learning for numerical weather and climate modelling: a review,” Geosci. Model Dev., vol. 16, no. 22, pp. 6433-6477, Nov. 2023, https://doi.org/10.5194/gmd-16-6433-2023 |
[13]
.
Reanalysis datasets provide physically consistent, spatially complete inputs for empirical prediction across data-sparse parts of West Africa. ERA5 offers hourly fields at ~31 km resolution from 1979 to present, with widely used surface and lower-tropospheric variables relevant to near-surface temperature variability and its drivers. These characteristics make it well-suited for constructing anomalies, lagged features, and circulation/radiative predictors at monthly lead times
| [14] | C. M. Taylor, P. P. Harris, and D. J. Parker, “Impact of soil moisture on the development of a Sahelian mesoscale convective system: a case-study from the AMMA Special Observing Period,” Q. J. R. Meteorol. Soc., vol. 136, no. S1, pp. 456-470, 2010, https://doi.org/10.1002/qj.465 |
[14]
.
Within the tropical Atlantic sector, studies distinguish how SST variability including Atlantic Niño flavors conditions monsoon behavior and low-level advection toward the coast, implying a role for oceanic predictors (and their lags) when forecasting coastal and transition-season temperature anomalies. Incorporating such indices complements land-memory signals and supports hybrid predictor sets that reflect both surface energy balance and remote boundary conditions
| [15] | “[1901.04592] Interpretable machine learning: definitions, methods, and applications.” Accessed: Sep. 06, 2025. [Online]. Available: https://arxiv.org/abs/1901.04592 |
[15]
.
Together, prior work supports three pillars for this study’s design: (i) use of ERA5 reanalysis to represent co-varying surface, radiative, and circulation fields; (ii) explicit representation of land-surface memory via lagged temperature and soil-moisture-related variables; and (iii) interpretable machine-learning models that can express nonlinear responses while yielding diagnostic attributions aligned with known West African mechanisms.
2. Methodology
2.1. Data Collection
The study utilizes the ERA5 Reanalysis dataset made available by the European Center for Medium-Range Weather Forecast (ECMWF). The ERA5 is a high-resolution world-class dataset that gives an hourly estimate of various atmospheric, maritime and land-surface dimensions. For this research, the following features were taken out of the dataset:
Temperature anomaly at 2 meters (T2M_Anomaly): The target variable presents the difference between the observation temperature and the long-term average temperature.
Moisture (clay_moisture): Moisture content in the soil, which can affect temperature changes.
Sea surface temperature (SST): surface temperature of the water body.
Wind components (U10, V10): Wind speed and wind direction 10 meters above the earth surface.
Surface Solar Radiation (SSR): Dose of solar energy received on the surface.
Surface heat flow (SSHF): heat flow to the atmosphere from the surface of the Earth.
Lat/Log Coordinates: Geographical information for regional analysis.
The data used spans from 1981 to 2024, covering all regions of Africa.
2.2. Feature Engineering
Feature engineering is a crucial step in enhancing model performance by containing both temporal and spatial information from the dataset. The following features are engineered:
Calendar Lender Features: Given the inconsistency of temperature, this shows a seasonal variation; sin and cos transformation of the month are added to have a picture of the cyclic nature of the year. Seasonal indicators are also added to classify months into seasons (eg, winter, summer, etc.).
Lagged Features: Temperatures in the previous months are included as temporary dependence and features for capturing persistence in temperature data. For example, the LAG-1 feature is used to represent temperature inconsistencies from the previous month.
Rolling Average: The 3-month rolling average for each facility is calculated to gain extensive trends and facilitate short-term fluctuations.
EOF Analysis: Empirical orthogonal function (EOF) is analyzed on specialized atmospheric variables (eg, soil moisture, SST) to obtain strong spatial patterns and reduce parameters.
2.3. Model Development
The study uses some machine learning models for temperature anomaly prediction. The following models are used:
1) Linear Models:
a) Linear Regression: A baseline model to predict temperature anomalies using basic relationships between features.
b) Ridge and Lasso Regression: Regularized linear models that help prevent overfitting by shrinking coefficients.
c) ElasticNet: Combines Lasso and Ridge regularization, useful when there is a mix of highly correlated features.
2) Tree-based Models:
a) Random Forest: A learning model that builds many decision trees and combines their predictions for better accuracy.
b) XGBoost: An efficient and scalable gradient boosting algorithm that excels in performance for tabular data.
c) LightGBM: A gradient boosting model optimized for speed and memory efficiency, useful for large datasets.
3) Interpretable Models:
a) Explainable Boosting Machine (EBM): A transparent model that captures both linear and non-linear relationships. This is very crucial for understanding how different features (e.g., soil moisture, SST) affect temperature anomalies.
b) Generalized Additive Models (GAM): An interpretable model that fits smooth, non-linear functions to the data, it provides adequate information on how individual features affect predictions.
2.4. Cross-Validation and Hyperparameter Tuning
The model is evaluated using window cross-validation, which simulates the actual world forecast using previous data for training and then for subsequent verification. This approach helps to avoid data leakage and ensures that the model normalizes unseen data efficiently. Training and validation divisions are based on annual increments, ensuring that future data is not used for training. The hyperparameter optimization model is executed using random search cross-validation to find the best combination of model parameters, such as learning rate, tree depth, and regularization power
| [9] | D. Oni, S. Mishra, L. T. Thanh, V. M. Phuc, and Y. Pham, “Detecting Stroke in Human Beings using Machine Learning,” in Health Informatics and Biomedical Engineering Applications, AHFE Open Acces, 2023.
https://doi.org/10.54941/ahfe1003460 |
[9]
.
2.5. Model Assessment
Many display matrices are used to evaluate models' predictions:
Route Mean Squared Error (RMSE): This measures the average magnitude of the prediction error, punishing large errors more heavily.
Mean Absolute Error (MAE): This represents the average absolute difference between the approximate and real values.
R-Squared (R²): The temperature discrepancies indicate the ratio of variance in the temperature discrepancies mentioned by the model.
Discrepancy correlation coefficient (ACC): This measures the relationship between predictive and real discrepancies, which is essential for climate-related predictions.
3. Data
The empirical foundation of this study is constructed on a set of well-established reanalysis datasets, complemented by auxiliary indices and spatial delineations that contextualize West African climate variability.
Figure 2. Annual Mean Temperature ANOMALY.
The following sections describe the primary dataset, the suite of variables selected, the procedures used to derive anomalies, and additional sources incorporated to enhance robustness and interpretability.
Data Dictionary for West Africa Temperature Anomaly Dataset.
Table 1. For Data Dictionary.
Variable | Description | Type / Units | Role |
date | Monthly timestamp (1981–2024). | DateTime | Metadata |
year | Calendar year. | Integer | Metadata |
month | Calendar month (1–12). | Integer | Metadata / seasonality |
region | Sub-region label (e.g., Sahel_W, Guinea_E, Gulf_of_Guinea). | Categorical | Metadata / grouping |
lat, lon | Latitude and longitude of the regional centroid. | Float (degrees) | Metadata |
set | Dataset split label (train, val, test). | Categorical | Metadata |
sin_m, cos_m | Harmonic encodings of month (annual cycle). | Float | Calendar / feature |
soil_moisture | Root-zone soil water content. Controls evapotranspiration and land feedbacks. | Fraction / volumetric (m³/m³) | Land-surface predictor |
tcwv | Total column water vapour (integrated atmospheric humidity). | kg m-² | Atmospheric predictor |
ssr | Surface shortwave radiation (downward solar flux). | W m-² | Radiative predictor |
sshf | Surface sensible heat flux (land → air heating). | W m-² | Surface energy predictor |
slhf | Surface latent heat flux (moisture/evaporation). | W m-² | Surface energy predictor |
u10, v10 | 10 m zonal and meridional wind components (east–west, north–south). | m s-¹ | Circulation predictor |
sst_gg_anom | Gulf of Guinea sea surface temperature anomaly. | °C (anomaly) | Oceanic predictor |
nino34 | Niño-3.4 index (central Pacific ENSO signal). | °C (anomaly) | Teleconnection predictor |
atl3 | Atlantic Niño index (equatorial Atlantic SST anomaly). | °C (anomaly) | Teleconnection predictor |
t2m_anom_lag1 | Previous month’s 2 m air temperature anomaly (persistence). | °C (anomaly) | Lagged predictor |
t2m_anomaly | Target: 2 m near-surface air temperature anomaly (relative to 1991–2020 mean). | °C (anomaly) | Prediction target |
3.1. ERA5 Reanalysis
The principal dataset for this research is the ERA5 global atmospheric reanalysis produced by the European Centre for Medium-Range Weather Forecasts (ECMWF). ERA5 offers a coherent reconstruction of atmospheric and surface conditions beginning in 1979 and continuing to the present, assimilating a broad spectrum of satellite and in situ observations into a consistent numerical weather prediction model. The data are provided at a spatial resolution of 0.25° by 0.25° and at hourly temporal frequency, which can be aggregated into monthly means for the purposes of anomaly prediction.
For this study, the focus is on near-surface and lower-tropospheric variables that are physically linked to temperature variability. Single-level fields include 2-meter air temperature, 2-meter dewpoint, 10-meter winds, surface pressure, skin temperature, total column water vapor, cloud cover, surface latent and sensible heat fluxes, soil moisture at multiple depths, evaporation, and radiative fluxes at the surface. Pressure-level variables are also incorporated, particularly temperature, geopotential height, humidity, wind components, and vertical velocity at 925, 850, 700, and 500 hPa, which together characterize the circulation regimes influencing near-surface thermal conditions.
3.2. Preprocessing and Anomaly Definition
All ERA5 variables are aggregated to monthly means over the spatial domain covering 20°W–20°E and 0°–25°N, encompassing the Sahel, the Guinea Coast, and transitional eco-climatic zones. The primary predictand, 2-meter air temperature, is converted to anomalies by removing the long-term monthly climatology. The baseline climatology is defined over the 1991–2020 period, consistent with World Meteorological Organization standards. For each calendar month, the climatological mean is subtracted from the corresponding monthly values, producing an anomaly time series that isolates departures from expected conditions. This procedure is also applied to predictor variables where relevant, such as soil moisture and radiative fluxes, to reduce seasonality and emphasize variability.
Figure 3. Distribution of Temperature Anomalies.
To avoid artificial skill, predictors are preprocessed with care to ensure temporal integrity. Lagged variables and rolling averages are computed within each region or grid cell to preserve consistency. Predictors that could introduce contemporaneous leakage of the target such as same-month 2-meter temperature are excluded when constructing forecast models for lead times beyond zero.
3.3. Teleconnection Indices
Large-scale modes of climate variability are represented through established indices that condense basin-wide processes into single time series. The Niño3.4 index is used to characterize El Niño–Southern Oscillation phases in the central equatorial Pacific. The Atlantic Niño index (ATL3) describes anomalous SSTs in the equatorial Atlantic, while the Indian Ocean Dipole index captures zonal SST gradients in the equatorial Indian Ocean.
Figure 4. Monthly Climatology of Temperature Anomalies.
These indices are included as potential predictors of West African temperature anomalies, recognizing their documented influence on regional circulation patterns and seasonal transitions. Lagged forms of these indices are constructed to capture delayed atmospheric responses and enhance forecast skill at lead times of one or more months
| [16] | “Development and application of a mesoscale climate model for the tropics: Influence of sea surface temperature anomalies on the West African monsoon - Vizy - 2002 - Journal of Geophysical Research: Atmospheres - Wiley Online Library.” Accessed: Sep. 06, 2025. [Online]. Available:
https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2001JD000686 |
[16]
.
3.4. Regional Mask and Spatial Aggregation
The West African domain is delineated geographically, with the bounding box specified above, and subregions are defined to capture spatial heterogeneity
. These include the Sahel zone, the Guinea Coast, and selected national or ecological subdivisions depending on the scale of analysis. A regional mask is applied to ensure consistency of spatial averaging and to facilitate the production of region-specific skill metrics. The combination of domain-wide and subregional perspectives provides both a broad climatological assessment and insights relevant to local decision-making contexts.
Figure 5. Spatial Skill (acc).
The data architecture integrates ERA5 as the central reanalysis resource, supplemented by optional reanalysis, sea surface temperatures, and teleconnection indices, all preprocessed into anomalies relative to a common baseline. The use of a clearly defined regional mask ensures that the derived datasets align with the geographical and sectoral focus of the study. This structured approach supports reproducibility, comparability, and physical interpretability in subsequent modeling and analysis.
4. Results
This chapter reports predictive skill relative to simple reference models, the seasonal and spatial organization of that skill, and insights from model interpretability. It concludes with focused case studies and sensitivity analyses that probe robustness and illuminate process-level mechanisms.
4.1. Overall Skill Relative to Baselines
Across the West Africa domain, statistical models trained on reanalysis predictors outperform both climatology (calendar-month mean) and anomaly persistence baselines. Improvements are clearest for lead-0 and remain present, though attenuated, for lead-1. Gains are reflected in lower MAE and RMSE and in higher anomaly correlation coefficients (ACC), indicating better phase alignment with observed anomalies.
Figure 6. Holdout Skill by Model.
Among candidate learners, additive models with smooth nonlinearities (EBM, GAM) and tree-based ensembles (random forest, gradient boosting) deliver the most consistent improvements over baselines. Linear models (Ridge/Elastic Net) reduce error relative to climatology, particularly in the Sahel and transition seasons, but underfit important interactions. EBM and gradient boosting recover additional skill by capturing curvature in soil-moisture and radiative responses and by exploiting persistence encoded in lagged temperature.
Holdout evaluation on recent years confirms these patterns. Skill remains stable under warming backgrounds, with modest degradation at longer lead and during months dominated by rapidly evolving convection
. Performance dispersion across subregions underscores the role of local land–sea contrasts and land-surface memory.
Overall performance. On the 2020–2024 holdout, all machine-learning models tested exceed climatology and persistence for Lead-0, with ensemble methods and interpretable additive models ranking highest. Improvements persist at Lead-1, though with reduced amplitude. Seasonal skill is structured: gains peak in transition seasons (MAM/SON) and remain positive but muted in JJA.
4.2. Seasonal and Spatial Organization of Skill
Skill is seasonally structured. Lead-0 exhibits the largest gains in boreal spring (MAM) and autumn (SON), when boundary-layer thermodynamics and radiation drive relatively smooth temperature variability. During boreal summer (JJA), forecast quality remains positive but is moderated by mesoscale convective organization and strong meridional gradients. Winter (DJF) skill is comparatively stable, with coastal and northern Sahel subregions benefitting from reduced convective noise.
Figure 7. Seasonal Cycle of Monthly Temperature Anomalies.
Spatially, improvements concentrate in semi-arid belts from Senegal–Mali eastward into Niger and northern Nigeria, consistent with strong land-atmosphere coupling and longer memory in soil moisture. Coastal zones along the Gulf of Guinea show mixed gains: models reproduce broad warm/cool phases but can under-resolve local gradients associated with upwelling and sea-breeze dynamics. Orographic flanks (e.g., highlands of Guinea and Cameroon) exhibit intermediate skill, reflecting complex local radiation–cloud interactions.
4.3. Interpretability Findings and Mechanistic Hypotheses
Feature attributions and response diagnostics provide convergent explanations for the patterns above:
1) Persistence and memory: The lag-1 temperature anomaly is a leading predictor in all models, especially in arid zones. Its positive contribution supports a view in which soil heat storage and slowly varying land-surface states transmit information across monthly scales.
2) Soil moisture coupling: ALE and SHAP analyses show a monotonic, physically coherent relationship in which lower soil moisture is associated with positive temperature anomalies via reduced evaporative cooling and increased sensible heat flux. The effect intensifies in JJA and over the Sahel, consistent with strong coupling regimes.
3) Radiative controls: Net shortwave at the surface contributes positively to warm anomalies across months, with a steeper slope in clear-sky conditions. Cloud proxies (e.g., total cloud cover) enter with negative associations, modulating shortwave and longwave terms in the expected direction.
4) Moist static environment: Total column water vapor and near-surface humidity exhibit seasonally varying effects—weakly positive at low to moderate values (warmer nights, suppressed evaporative fraction), flattening or reversing when associated with persistent cloudiness and convective regimes.
5) Circulation and advection: Low-level wind components (u10, v10) display regionally structured responses. Westerly anomalies (positive u) along the coast link to marine air advection and moderated temperatures, while northerly components (positive v) in the Sahel can precede warm anomalies through dry advection.
6) Oceanic boundary conditions: Gulf of Guinea and equatorial Atlantic SST anomalies load positively on coastal temperature anomalies with a short lag, with weaker but detectable domain-wide effects at lead-1 during transition seasons. ENSO (Niño3.4) and ATL3 indices contribute modestly and primarily through their modulation of circulation and SST patterns rather than as stand-alone predictors.
These findings motivate a mechanistic hypothesis in which monthly temperature anomalies arise from the joint action of (I) land-surface memory and evaporative partitioning, (ii) radiative forcing shaped by cloud regimes, and (iii) low-level advective patterns tied to oceanic states. The balance among these mechanisms varies seasonally and across the coastal–inland gradient.
4.4. Case Studies
Three case studies illustrate these mechanisms and the models’ behavior:
Figure 8. Case Study Guinea.
1) Sahel warm episode during late pre-monsoon (MAM): A widespread positive anomaly is well captured by EBM and boosting models. Attributions emphasize depleted soil moisture, enhanced shortwave fluxes, and positive lag-1 temperature, with minimal teleconnection influence. Maps of average SHAP values align with regions of strong land–atmosphere coupling.
2) Coastal cool anomaly during early monsoon onset (JJA): Models reproduce the sign and spatial pattern but under-estimate amplitude near sharp SST gradients. Feature effects highlight negative shortwave anomalies and increased onshore flow; residuals concentrate where coastal upwelling is strongest, indicating limits of coarse oceanic predictors for narrow coastal bands.
3) Pan-regional warm month following positive Atlantic equatorial SST (SON): Lead-1 forecasts improve relative to persistence, with SST-linked features gaining weight alongside lagged temperature. Attribution spreads more evenly across oceanic and land variables, consistent with transitional circulation shifts.
4.5. Sensitivity and Robustness
A set of controlled perturbations and re-estimations assess robustness:
1) Reanalysis source: Swapping ERA5 for an alternative reanalysis preserves relative model rankings and broad attribution patterns, with small absolute skill changes that scale with variable biases. Soil-moisture-related effects are most sensitive to dataset choice, as expected.
2) Feature collinearity: Grouped permutation importance and ALE mitigate spurious importance inflation. When correlated radiation and cloud proxies are grouped, the aggregate effect remains stable and physically interpretable.
3) Teleconnection configuration: Varying index lags (0–3 months) and excluding individual indices has limited impact on lead-0 performance and modest impact for lead-1 in MAM/SON, suggesting that oceanic influences are secondary to local memory at monthly horizons but useful in transitional periods.
4) Non-stationarity: Decadal split tests show small drifts in baseline error consistent with warming; relative gains over baselines remain. Models retain sign-consistent partial responses across decades, indicating stable learned relationships.
5) Data availability: Down-sampling predictors and withholding specific variable groups reduce skill in a manner consistent with their physical roles (largest declines when soil moisture and radiative fluxes are removed). This supports the mechanistic interpretation derived from the full model.
4.6. Summary
Models trained on reanalysis predictors provide reliable monthly predictions of temperature anomalies in West Africa, outperforming climatology and persistence, especially during transition seasons and over semi-arid subregions. Interpretability tools yield physically consistent attributions centered on land-surface memory, radiative forcing, and low-level advection, with oceanic states contributing at short lead during transitional months. Case studies and sensitivity analyses corroborate these mechanisms and delineate current limits near coastal gradients and during strongly convective periods. These results substantiate the use of interpretable machine learning as a practical complement to dynamical guidance for temperature-related risk management in the region.
5. Discussion
5.1. Physical Consistency and Comparison with Prior Studies
The predictive relationships identified by the interpretable machine-learning framework align well with established physical mechanisms in West African climate research. The consistent negative association between soil moisture and contributions of net shortwave radiation and negative effects of cloud cover correspond to the expected radiative balance mechanisms reported in both process studies and large-scale reanalysis assessments. temperature anomalies corroborates prior observational and modeling studies that emphasize land–atmosphere coupling in the Sahel and adjacent semi-arid zones
| [19] | B. Fontaine, S. Louvet, and P. Roucou, “Fluctuations in annual cycles and inter-seasonal memory in West Africa: rainfall, soil moisture and heat fluxes,” Theor. Appl. Climatol., vol. 88, no. 1, pp. 57-70, Jan. 2007,
https://doi.org/10.1007/s00704-006-0246-4 |
[19]
Positive.
Figure 9. Lag vs Current Anomaly.
Persistence signals from lagged temperature anomalies also reflect the well-documented memory embedded in land surface and boundary layer processes. Oceanic predictors, particularly Gulf of Guinea and Atlantic equatorial SST anomalies, show influence during transition seasons, consistent with previous analyses linking SST variability to monsoon onset and retreat. These consistencies reinforce the plausibility of the learned relationships and suggest that the machine-learning framework has captured physically meaningful signals rather than spurious statistical associations
.
5.2. Limitations and Sources of Uncertainty
Despite these encouraging findings, several limitations constrain the scope of inference. First, reliance on reanalysis data introduces potential biases, especially for surface fluxes, soil moisture, and coastal SSTs, where observational density is limited and assimilation relies on model physics. Such biases can propagate into predictors and may lead to systematic distortions in model coefficients or feature attributions. Second, the climate system in West Africa is subject to pronounced non-stationarity, including secular warming and evolving land-use patterns. Models trained on past relationships may experience degradation under novel forcing regimes, and skill in recent holdout years, while positive, suggests gradual shifts in error characteristics. Third, predictor collinearity common among radiative, moisture, and circulation fields can complicate attribution. Although permutation importance, accumulated local effects, and grouped analyses mitigate this, residual uncertainty remains in disentangling correlated drivers. Finally, while the models provide insights at monthly scales, they are not designed to capture higher-frequency intraseasonal variability or extremes at daily resolution.
5.3. Practical Implications and Pathways Toward Operations
The demonstrated skill relative to climatology and persistence, combined with transparent interpretability, underscores potential value for operational climate services in West Africa. Agricultural planners could use forecasts of positive anomalies to anticipate heat stress on crops and livestock, while health agencies might integrate anomaly guidance into heat-health early warning systems
. The energy sector can employ predicted anomalies for demand projections and management of generation portfolios sensitive to ambient temperatures. From an operational perspective, the modular pipeline offers pathways for integration: real-time reanalysis or near-real-time climate monitoring datasets can provide inputs; the interpretable models can run in lightweight environments; and outputs can be disseminated as probability distributions or categorical outlooks. Importantly, embedding interpretability ensures that sectoral users receive not only predictions but also explanations grounded in known physical drivers, thereby fostering trust and informed decision making
. The open pipeline and reproducible artifacts also support adaptation by regional institutions and cross-comparison with dynamical seasonal forecasts, paving the way for hybrid approaches.
6. Conclusion
This study presented a systematic framework for predicting monthly near-surface air temperature anomalies in West Africa using reanalysis inputs and interpretable machine-learning models. By establishing expanding-window validation and holdout testing, the analysis demonstrated consistent improvements over climatology and persistence, with strongest gains during transitional seasons and in semi-arid subregions. Interpretability diagnostics revealed physically coherent relationships involving soil moisture deficits, radiative fluxes, persistence, and oceanic states, consistent with prior literature and mechanistic understanding. Case studies and sensitivity analyses confirmed the robustness of these relationships while delineating current limits near coastal gradients and during convectively dominated periods.
Figure 10. Feature Correlation Heatmap.
The key insights are threefold. First, interpretable machine learning can augment climate prediction in data-sparse regions by leveraging reanalysis products and encoding physically meaningful associations. Second, robust validation designs are essential for avoiding overestimation of skill and for quantifying performance under non-stationarity. Third, combining predictive accuracy with transparent explanations enables practical application in agriculture, health, and energy sectors, supporting climate-informed risk management.
All components of the pipeline including data preprocessing, anomaly computation, model training, and interpretability diagnostics are released as reproducible artifacts. These provide a foundation for extension to higher spatial resolution, longer lead times, and integration with dynamical forecasts. By offering both a methodological template and empirical insights, the work contributes to advancing climate services in West Africa and demonstrates the role of interpretable data-driven methods as complements to existing forecasting systems.
Abbreviations
ACC | Anomaly Correlation Coefficient |
DJF | December–January–February |
EBM | Explainable Boosting Machine |
ENSO | El Niño–Southern Oscillation |
ERA5 | ECMWF Reanalysis 5 |
GAM | Generalized Additive Models |
JJA | June–July–August |
MAE | Mean Absolute Error |
MAM | March–April–May |
RMSE | Root Mean Square Error |
SST | Sea Surface Temperature |
SON | September–October–November |
TCWV | Total Column Water Vapour |
T2M | 2-meter Air Temperature |
Author Contributions
Oni Damilola: Conceptualization, Data curation, Investigation, Methodology, Supervision
Chukwuka Christian Ofoegbu: Project administration, Supervision, Writing – original draft, Writing – review & editing
Okon Paul: Conceptualization, Formal Analysis, Writing – review & editing
Taiwo Ikeoluwa Odunayo: Investigation, Project administration, Resources
Akinyooye Demilade Emmanuel: Writing – review & editing
Data Availability Statement
The data supporting the outcome of this research work has been reported in this manuscript and is available on request
Conflicts of Interest
The authors declare no conflicts of interest.
References
| [1] |
S. Diatta and A. H. Fink, “Statistical relationship between remote climate indices and West African monsoon variability,” Int. J. Climatol., vol. 34, no. 12, pp. 3348-3367, Oct. 2014,
https://doi.org/10.1002/joc.3912
|
| [2] |
“Temperature response functions for residential energy demand - A review of models - ScienceDirect.” Accessed: Sep. 06, 2025. [Online]. Available:
https://www.sciencedirect.com/science/article/abs/pii/S2212095516300013
|
| [3] |
R. Fazeli, M. Ruth, and B. Davidsdottir, “Temperature response functions for residential energy demand - A review of models,” Urban Clim., vol. 15, pp. 45-59, Mar. 2016,
https://doi.org/10.1016/j.uclim.2016.01.001
|
| [4] |
D. E. Parker, “Urban heat island effects on estimates of observed climate change,” WIREs Clim. Change, vol. 1, no. 1, pp. 123-133, 2010,
https://doi.org/10.1002/wcc.21
|
| [5] |
“Recent Progress and Future Prospects of Subseasonal and Seasonal Climate Predictions in: Bulletin of the American Meteorological Society Volume 101 Issue 5 (2020).” Accessed: Sep. 06, 2025. [Online]. Available:
https://journals.ametsoc.org/view/journals/bams/101/5/bams-d-19-0300.1.xml
|
| [6] |
“Empirical‐statistical downscaling in climate modeling - Benestad - 2004 - Eos, Transactions American Geophysical Union - Wiley Online Library.” Accessed: Sep. 06, 2025. [Online]. Available:
https://agupubs.onlinelibrary.wiley.com/doi/abs/10.1029/2004eo420002
|
| [7] |
“PBC-ML: Predicting Breast Cancer in Humans using Machine Learning Approach.” Accessed: Sep. 06, 2025. [Online]. Available:
https://openaccess.cms-conferences.org/publications/book/978-1-958651-54-4/article/978-1-958651-54-4_5
|
| [8] |
D. F. N. Oliveira et al., “A New Interpretable Unsupervised Anomaly Detection Method Based on Residual Explanation,” IEEE Access, vol. 10, pp. 1401-1409, 2022,
https://doi.org/10.1109/ACCESS.2021.3137633
|
| [9] |
D. Oni, S. Mishra, L. T. Thanh, V. M. Phuc, and Y. Pham, “Detecting Stroke in Human Beings using Machine Learning,” in Health Informatics and Biomedical Engineering Applications, AHFE Open Acces, 2023.
https://doi.org/10.54941/ahfe1003460
|
| [10] |
O. Damilola, “CYBER SECURITY AWARENESS IN DEVELOPING COUNTRIES IN AFRICA: LESSONS FROM NIGERIA,” 2024.
|
| [11] |
“Influence of Soil Moisture Anomaly on Temperature in the Sahel: A Comparison between Wet and Dry Decades in: Journal of Hydrometeorology Volume 4 Issue 2 (2003).” Accessed: Sep. 06, 2025. [Online]. Available:
https://journals.ametsoc.org/view/journals/hydr/4/2/1525-7541_2003_4_437_iosmao_2_0_co_2.xml
|
| [12] |
B. Koné et al., “Influence of initial soil moisture in a regional climate model study over West Africa - Part 1: Impact on the climate mean,” Hydrol. Earth Syst. Sci., vol. 26, no. 3, pp. 711-730, Feb. 2022,
https://doi.org/10.5194/hess-26-711-2022
|
| [13] |
C. O. de Burgh-Day and T. Leeuwenburg, “Machine learning for numerical weather and climate modelling: a review,” Geosci. Model Dev., vol. 16, no. 22, pp. 6433-6477, Nov. 2023,
https://doi.org/10.5194/gmd-16-6433-2023
|
| [14] |
C. M. Taylor, P. P. Harris, and D. J. Parker, “Impact of soil moisture on the development of a Sahelian mesoscale convective system: a case-study from the AMMA Special Observing Period,” Q. J. R. Meteorol. Soc., vol. 136, no. S1, pp. 456-470, 2010,
https://doi.org/10.1002/qj.465
|
| [15] |
“[1901.04592] Interpretable machine learning: definitions, methods, and applications.” Accessed: Sep. 06, 2025. [Online]. Available:
https://arxiv.org/abs/1901.04592
|
| [16] |
“Development and application of a mesoscale climate model for the tropics: Influence of sea surface temperature anomalies on the West African monsoon - Vizy - 2002 - Journal of Geophysical Research: Atmospheres - Wiley Online Library.” Accessed: Sep. 06, 2025. [Online]. Available:
https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2001JD000686
|
| [17] |
D. Oni, E. Arshad, and B. Pham, Cybercrime On Social Media In Nigeria: Trends, Scams, Vulnerabilities and Prevention, vol. 2. 2023.
https://doi.org/10.22624/AIMS/CSEAN-SMART2023P17
|
| [18] |
“Quantifying the Effect of Climate Change on Midlatitude Subseasonal Prediction Skill Provided by the Tropics - Mayer - 2022 - Geophysical Research Letters - Wiley Online Library.” Accessed: Sep. 06, 2025. [Online]. Available:
https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2022GL098663
|
| [19] |
B. Fontaine, S. Louvet, and P. Roucou, “Fluctuations in annual cycles and inter-seasonal memory in West Africa: rainfall, soil moisture and heat fluxes,” Theor. Appl. Climatol., vol. 88, no. 1, pp. 57-70, Jan. 2007,
https://doi.org/10.1007/s00704-006-0246-4
|
| [20] |
“Operationally meaningful representations of physical systems in neural networks - IOPscience.” Accessed: Sep. 06, 2025. [Online]. Available:
https://iopscience.iop.org/article/10.1088/2632-2153/ac9ae8/meta
|
| [21] |
“Operational meteorology in West Africa: observational networks, weather analysis and forecasting - Fink - 2011 - Atmospheric Science Letters - Wiley Online Library.” Accessed: Sep. 06, 2025. [Online]. Available:
https://rmets.onlinelibrary.wiley.com/doi/abs/10.1002/asl.324
|
| [22] |
“Interpretable and Explainable Machine Learning for Materials Science and Chemistry | Accounts of Materials Research.” Accessed: Sep. 06, 2025. [Online]. Available:
https://pubs.acs.org/doi/full/10.1021/accountsmr.1c00244
|
Cite This Article
-
APA Style
Damilola, O., Ofoegbu, C. C., Paul, O., Odunayo, T. I., Emmanuel, A. D. (2025). Reanalysis-driven Prediction of Monthly Temperature Anomalies in West Africa Using Interpretable Machine Learning. American Journal of Artificial Intelligence, 9(2), 310-323. https://doi.org/10.11648/j.ajai.20250902.30
Copy
|
Download
ACS Style
Damilola, O.; Ofoegbu, C. C.; Paul, O.; Odunayo, T. I.; Emmanuel, A. D. Reanalysis-driven Prediction of Monthly Temperature Anomalies in West Africa Using Interpretable Machine Learning. Am. J. Artif. Intell. 2025, 9(2), 310-323. doi: 10.11648/j.ajai.20250902.30
Copy
|
Download
AMA Style
Damilola O, Ofoegbu CC, Paul O, Odunayo TI, Emmanuel AD. Reanalysis-driven Prediction of Monthly Temperature Anomalies in West Africa Using Interpretable Machine Learning. Am J Artif Intell. 2025;9(2):310-323. doi: 10.11648/j.ajai.20250902.30
Copy
|
Download
-
@article{10.11648/j.ajai.20250902.30,
author = {Oni Damilola and Chukwuka Christian Ofoegbu and Okon Paul and Taiwo Ikeoluwa Odunayo and Akinyooye Demilade Emmanuel},
title = {Reanalysis-driven Prediction of Monthly Temperature Anomalies in West Africa Using Interpretable Machine Learning},
journal = {American Journal of Artificial Intelligence},
volume = {9},
number = {2},
pages = {310-323},
doi = {10.11648/j.ajai.20250902.30},
url = {https://doi.org/10.11648/j.ajai.20250902.30},
eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajai.20250902.30},
abstract = {In West Africa, accurate predictions of temperature are very essential for agriculture, health and energy planning, where climate change and increasing heat pose a high risk. This study develops an open and propagative pipeline for predicting monthly surface temperature anomalies using the ERA 5 Reanalysis inputs and interpretable machine-learning models. The predicting variables include land -atmosphere flux, soil moisture, radiation conditions, circulation fields, and oceanic indices, which are processed into anomalies and lagged features to capture persistence and memory. The results show that machine-learning models continues leading the climatological and persistence baselines, With the strongest gains occurring during transition seasons and over semi-arid regions where land–atmosphere coupling is strong. Interpretive analysis reveals physically relevant relationships: deficits of soil moisture operate positive anomalies through lowered cooling by evaporation; Shortwave radiation and cloud cover modulate surface energy balance; And the lagged anomalies encode land-split memory. Water-borne countries, especially the Gulf of the Guinea SST, contribute during the transitional months, but are secondary to local reactions. Case studies and sensitivity analysis confirm the strength of these mechanisms by identifying coastal gradients and strongly convection periods. The findings suggest that machine learning provides efficient and physically consistent predictions of West African temperature discrepancies, providing practical value for climatic services in agriculture, health and energy fields. The released pipelines and artifacts carry forward the route towards fertility, integration with regional institutions, and integration with dynamic forecasts, operating climate-informed decision support in the region.},
year = {2025}
}
Copy
|
Download
-
TY - JOUR
T1 - Reanalysis-driven Prediction of Monthly Temperature Anomalies in West Africa Using Interpretable Machine Learning
AU - Oni Damilola
AU - Chukwuka Christian Ofoegbu
AU - Okon Paul
AU - Taiwo Ikeoluwa Odunayo
AU - Akinyooye Demilade Emmanuel
Y1 - 2025/12/26
PY - 2025
N1 - https://doi.org/10.11648/j.ajai.20250902.30
DO - 10.11648/j.ajai.20250902.30
T2 - American Journal of Artificial Intelligence
JF - American Journal of Artificial Intelligence
JO - American Journal of Artificial Intelligence
SP - 310
EP - 323
PB - Science Publishing Group
SN - 2639-9733
UR - https://doi.org/10.11648/j.ajai.20250902.30
AB - In West Africa, accurate predictions of temperature are very essential for agriculture, health and energy planning, where climate change and increasing heat pose a high risk. This study develops an open and propagative pipeline for predicting monthly surface temperature anomalies using the ERA 5 Reanalysis inputs and interpretable machine-learning models. The predicting variables include land -atmosphere flux, soil moisture, radiation conditions, circulation fields, and oceanic indices, which are processed into anomalies and lagged features to capture persistence and memory. The results show that machine-learning models continues leading the climatological and persistence baselines, With the strongest gains occurring during transition seasons and over semi-arid regions where land–atmosphere coupling is strong. Interpretive analysis reveals physically relevant relationships: deficits of soil moisture operate positive anomalies through lowered cooling by evaporation; Shortwave radiation and cloud cover modulate surface energy balance; And the lagged anomalies encode land-split memory. Water-borne countries, especially the Gulf of the Guinea SST, contribute during the transitional months, but are secondary to local reactions. Case studies and sensitivity analysis confirm the strength of these mechanisms by identifying coastal gradients and strongly convection periods. The findings suggest that machine learning provides efficient and physically consistent predictions of West African temperature discrepancies, providing practical value for climatic services in agriculture, health and energy fields. The released pipelines and artifacts carry forward the route towards fertility, integration with regional institutions, and integration with dynamic forecasts, operating climate-informed decision support in the region.
VL - 9
IS - 2
ER -
Copy
|
Download