Research Article | | Peer-Reviewed

Bridging the Gap Between Accuracy and Interpretability: A Hybrid LSTM Approach with SHAP for ICU Mortality Prediction Using EHR Data

Received: 27 September 2025     Accepted: 10 October 2025     Published: 29 December 2025
Views:       Downloads:
Abstract

While deep learning models have achieved remarkable performance, their adoption in healthcare faces a critical challenge due to a lack of interpretability. Interpretability is a serious issue in high-stakes environments like Intensive Care Units (ICUs), where transparency in decision-making is not just mandatory but also essential. Several studies have ascertained the trade-off between performance and interpretability, with interpretability being sacrificed for performance and vice versa. Therefore, this study aims to demonstrate that performance and interpretability need not be mutually exclusive by proposing a hybrid framework that integrates LSTM, a deep learning architecture, with explainable models such as SHAP for ICU mortality prediction using Electronic Health Record (EHR) data. The study employs publicly available ICU datasets such as MIMIC-III or MIMIC-IV, which contain comprehensive EHR data for ICU patients. The LSTM achieved an accuracy of 98.6% and a recall of 87.5% on unseen data, but recorded low Precision, indicating that the model was biased toward the majority class (No Mortality). When the LSTM results were compared with baseline models (Random Forest and Logistic Regression), it generally outperformed the baseline models. The major limitation is the presence of class imbalance within the dataset, as shown by the precision. Despite this, the LSTM model successfully maintained interpretability through SHAP without compromising predictive performance, thereby achieving a balance between accuracy, transparency, and clinical relevance.

Published in International Journal of Intelligent Information Systems (Volume 14, Issue 6)
DOI 10.11648/j.ijiis.20251406.11
Page(s) 102-120
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2025. Published by Science Publishing Group

Keywords

Long Short-Term Memory (LSTM), Interpretability, Explainable AI (XAI), SHapley Additive exPlanations (SHAP), Local Interpretable Model-agnostic Explanations (LIME), Electronic Health Record (EHR), Trade-off

References
[1] Herman, B. The Promise and Peril of Human Evaluation for Model Interpretability. Arxiv, Online; 2017.
[2] Lakkaraju, H., Bach, S. H., Leskovec, J. Interpretable Decision Sets: A Joint Framework for Description and Prediction. In Proceedings of the 22nd ACM SIGKDD International Conference On Knowledge Discovery and Data Mining. ACM, San Francisco; 2016, pp. 1675-1684.
[3] Ribeiro, M. T., Singh, S., Guestrin, C. Anchors: High-Precision Model-Agnostic Explanations. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI-18). Mcilraith, S. A., Weinberger, K. Q., Eds., AAAI Press, New Orleans, Louisiana, USA; 2018, Pp. 1527-1535.
[4] Angelino, E., Larus-Stone, N., Alabi, D., Seltzer, M., Rudin, C. Learning Certifiably Optimal Rule Lists. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York; 2017, pp. 35-44.
[5] Yoon, C. H., Torrance, R., Scheinerman, N. Machine Learning In Medicine: Should The Pursuit of Enhanced Interpretability be Abandoned? Journal of Medical Ethics. 2022, 48: 581-585.
[6] Agarwal, C., Nguyen, A. Explaining Image Classifiers by Removing Input Features Using Generative Models. In ACCV. Springer, Cham; 2021, pp. 101-118.
[7] Doshi-Velez, F., Kim, B. Towards A Rigorous Science Of Interpretable Machine Learning. Arxiv, Ithaca, New York, USA; 2017.
[8] Miller, T. Explanation in Artificial Intelligence: Insights from the Social Sciences. Artificial Intelligence. 2019, 267: 1-38.
[9] International Organization for Standardization (ISO). ISO/IEC TR 24028: 2020 — Information Technology — Artificial Intelligence — Overview of Trustworthiness in AI. Available from:
[10] Markus, A. F., Kors, J. A., Rijnbeek, P. R. The Role of Explainability In Creating Trustworthy Artificial Intelligence For Health Care: A Comprehensive Survey Of The Terminology, Design Choices, And Evaluation Strategies. Journal of Biomedical Informatics. 2021, 113: 103655.
[11] Esna-Ashari, M. Beyond The Black Box: Review Of Quantitative Metrics For Neural Network Interpretability And Their Practical Implications. International Journal of Sustainable Applied Science and Engineering. 2025, 2: 1-24.
[12] Yang, R. A., Jingyu, H., Zihao, L. C., Et Al. Interpretable Machine Learning for Weather and Climate Prediction: A Review. Atmospheric Environment. 2024, 120797.
[13] Guangming, H., Yingya, L., Shoaib, J., Et Al. From Explainable To Interpretable Deep Learning For Natural Language Processing In Healthcare: How Far From Reality? Computational and Structural Biotechnology Journal. 2024.
[14] Giacobbe, D. R., Marelli, C., Guastavino, S., Et Al. Explainable and Interpretable Machine Learning for Antimicrobial Stewardship: Opportunities and Challenges. Clinical Therapeutics. 2024.
[15] Kumar, A., Dikshit, S., Albuquerque, V. Explainable Artificial Intelligence for Sarcasm Detection in Dialogues. Wireless Communications and Mobile Computing. 2021, 1: 1-13.
[16] Ahmad, T., Katari, P., Kumar, A., Ravi, C., Shaik, M. Explainable AI: Interpreting Deep Learning Models for Decision Support. In Advances in Deep Learning Techniques. 2024, 4.
[17] Johansson, U., Sönströd, C., Norinder, U., Boström, H. Trade-Off between Accuracy and Interpretability for Predictive In Silico Modeling. Future Medicinal Chemistry. 2011, 3: 647-663.
[18] Ribeiro, M. T., Singh, S., Guestrin, C. Why Should I Trust you? Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, San Francisco; 2016, pp. 1135-1144.
[19] Aldughay, B., Ashfaq, F., Jhanjhi, N. Z., Humayun, M. Explainable AI For Retinoblastoma Diagnosis: Interpreting Deep Learning Models with LIME And SHAP. Artificial Intelligence in Medical Images. 2023, 13: 1-13.
[20] Caruana, R., Lou, Y., Gehrke, J., Koch, P., Sturm, M., Elhadad, N. Intelligible Models for Healthcare: Predicting Pneumonia Risk and Hospital 30-Day Readmission. In Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2015, Pp. 1721-1730.
[21] Adebayo, R. P. Optimizing Food101 Classification with Transfer Learning: A Fine-Tuning Approach Using Efficientnetb0. International Journal of Intelligent Information Systems. 2024, 13: 59-77.
[22] Kanaung Academy. Understanding LSTM: A Deep Dive Into Its Inner Workings With Code Walkthrough. Available From:
[23] D2L Team. Dive Into Deep Learning: Long Short Term Memory. Available From:
[24] Prasan, N. H. Activation Functions in Neural Networks. Medium. Available From:
[25] Antonini, A. S., Tanzola, J., Asiain, L., Ferracutti, G. R., Castro, S. M., Bjerg, E. A., Ganuza, M. L. Machine Learning Model Interpretability Using SHAP Values: Application To Igneous Rock Classification Task. Applied Computing and Geosciences. 2024, 100178.
[26] Parisineni, S. R. A., Pal, M. Enhancing Trust and Interpretability Of Complex Machine Learning Models Using Local Interpretable Model-Agnostic SHAP Explanations. International Journal of Data Science and Analytics. 2024, 18: 457-466.
[27] Lipton, Z. C. The Mythos of Model Interpretability: In Machine Learning, The Concept of Interpretability is both Important and Slippery. Queue. 2018, 16: 31-57.
Cite This Article
  • APA Style

    Adebayo, R. P. (2025). Bridging the Gap Between Accuracy and Interpretability: A Hybrid LSTM Approach with SHAP for ICU Mortality Prediction Using EHR Data. International Journal of Intelligent Information Systems, 14(6), 102-120. https://doi.org/10.11648/j.ijiis.20251406.11

    Copy | Download

    ACS Style

    Adebayo, R. P. Bridging the Gap Between Accuracy and Interpretability: A Hybrid LSTM Approach with SHAP for ICU Mortality Prediction Using EHR Data. Int. J. Intell. Inf. Syst. 2025, 14(6), 102-120. doi: 10.11648/j.ijiis.20251406.11

    Copy | Download

    AMA Style

    Adebayo RP. Bridging the Gap Between Accuracy and Interpretability: A Hybrid LSTM Approach with SHAP for ICU Mortality Prediction Using EHR Data. Int J Intell Inf Syst. 2025;14(6):102-120. doi: 10.11648/j.ijiis.20251406.11

    Copy | Download

  • @article{10.11648/j.ijiis.20251406.11,
      author = {Rotimi Philip Adebayo},
      title = {Bridging the Gap Between Accuracy and Interpretability: 
    A Hybrid LSTM Approach with SHAP for ICU Mortality Prediction Using EHR Data},
      journal = {International Journal of Intelligent Information Systems},
      volume = {14},
      number = {6},
      pages = {102-120},
      doi = {10.11648/j.ijiis.20251406.11},
      url = {https://doi.org/10.11648/j.ijiis.20251406.11},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijiis.20251406.11},
      abstract = {While deep learning models have achieved remarkable performance, their adoption in healthcare faces a critical challenge due to a lack of interpretability. Interpretability is a serious issue in high-stakes environments like Intensive Care Units (ICUs), where transparency in decision-making is not just mandatory but also essential. Several studies have ascertained the trade-off between performance and interpretability, with interpretability being sacrificed for performance and vice versa. Therefore, this study aims to demonstrate that performance and interpretability need not be mutually exclusive by proposing a hybrid framework that integrates LSTM, a deep learning architecture, with explainable models such as SHAP for ICU mortality prediction using Electronic Health Record (EHR) data. The study employs publicly available ICU datasets such as MIMIC-III or MIMIC-IV, which contain comprehensive EHR data for ICU patients. The LSTM achieved an accuracy of 98.6% and a recall of 87.5% on unseen data, but recorded low Precision, indicating that the model was biased toward the majority class (No Mortality). When the LSTM results were compared with baseline models (Random Forest and Logistic Regression), it generally outperformed the baseline models. The major limitation is the presence of class imbalance within the dataset, as shown by the precision. Despite this, the LSTM model successfully maintained interpretability through SHAP without compromising predictive performance, thereby achieving a balance between accuracy, transparency, and clinical relevance.},
     year = {2025}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Bridging the Gap Between Accuracy and Interpretability: 
    A Hybrid LSTM Approach with SHAP for ICU Mortality Prediction Using EHR Data
    AU  - Rotimi Philip Adebayo
    Y1  - 2025/12/29
    PY  - 2025
    N1  - https://doi.org/10.11648/j.ijiis.20251406.11
    DO  - 10.11648/j.ijiis.20251406.11
    T2  - International Journal of Intelligent Information Systems
    JF  - International Journal of Intelligent Information Systems
    JO  - International Journal of Intelligent Information Systems
    SP  - 102
    EP  - 120
    PB  - Science Publishing Group
    SN  - 2328-7683
    UR  - https://doi.org/10.11648/j.ijiis.20251406.11
    AB  - While deep learning models have achieved remarkable performance, their adoption in healthcare faces a critical challenge due to a lack of interpretability. Interpretability is a serious issue in high-stakes environments like Intensive Care Units (ICUs), where transparency in decision-making is not just mandatory but also essential. Several studies have ascertained the trade-off between performance and interpretability, with interpretability being sacrificed for performance and vice versa. Therefore, this study aims to demonstrate that performance and interpretability need not be mutually exclusive by proposing a hybrid framework that integrates LSTM, a deep learning architecture, with explainable models such as SHAP for ICU mortality prediction using Electronic Health Record (EHR) data. The study employs publicly available ICU datasets such as MIMIC-III or MIMIC-IV, which contain comprehensive EHR data for ICU patients. The LSTM achieved an accuracy of 98.6% and a recall of 87.5% on unseen data, but recorded low Precision, indicating that the model was biased toward the majority class (No Mortality). When the LSTM results were compared with baseline models (Random Forest and Logistic Regression), it generally outperformed the baseline models. The major limitation is the presence of class imbalance within the dataset, as shown by the precision. Despite this, the LSTM model successfully maintained interpretability through SHAP without compromising predictive performance, thereby achieving a balance between accuracy, transparency, and clinical relevance.
    VL  - 14
    IS  - 6
    ER  - 

    Copy | Download

Author Information
  • Sections