While deep learning models have achieved remarkable performance, their adoption in healthcare faces a critical challenge due to a lack of interpretability. Interpretability is a serious issue in high-stakes environments like Intensive Care Units (ICUs), where transparency in decision-making is not just mandatory but also essential. Several studies have ascertained the trade-off between performance and interpretability, with interpretability being sacrificed for performance and vice versa. Therefore, this study aims to demonstrate that performance and interpretability need not be mutually exclusive by proposing a hybrid framework that integrates LSTM, a deep learning architecture, with explainable models such as SHAP for ICU mortality prediction using Electronic Health Record (EHR) data. The study employs publicly available ICU datasets such as MIMIC-III or MIMIC-IV, which contain comprehensive EHR data for ICU patients. The LSTM achieved an accuracy of 98.6% and a recall of 87.5% on unseen data, but recorded low Precision, indicating that the model was biased toward the majority class (No Mortality). When the LSTM results were compared with baseline models (Random Forest and Logistic Regression), it generally outperformed the baseline models. The major limitation is the presence of class imbalance within the dataset, as shown by the precision. Despite this, the LSTM model successfully maintained interpretability through SHAP without compromising predictive performance, thereby achieving a balance between accuracy, transparency, and clinical relevance.
| Published in | International Journal of Intelligent Information Systems (Volume 14, Issue 6) |
| DOI | 10.11648/j.ijiis.20251406.11 |
| Page(s) | 102-120 |
| Creative Commons |
This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited. |
| Copyright |
Copyright © The Author(s), 2025. Published by Science Publishing Group |
Long Short-Term Memory (LSTM), Interpretability, Explainable AI (XAI), SHapley Additive exPlanations (SHAP), Local Interpretable Model-agnostic Explanations (LIME), Electronic Health Record (EHR), Trade-off
| [1] | Herman, B. The Promise and Peril of Human Evaluation for Model Interpretability. Arxiv, Online; 2017. |
| [2] | Lakkaraju, H., Bach, S. H., Leskovec, J. Interpretable Decision Sets: A Joint Framework for Description and Prediction. In Proceedings of the 22nd ACM SIGKDD International Conference On Knowledge Discovery and Data Mining. ACM, San Francisco; 2016, pp. 1675-1684. |
| [3] | Ribeiro, M. T., Singh, S., Guestrin, C. Anchors: High-Precision Model-Agnostic Explanations. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI-18). Mcilraith, S. A., Weinberger, K. Q., Eds., AAAI Press, New Orleans, Louisiana, USA; 2018, Pp. 1527-1535. |
| [4] | Angelino, E., Larus-Stone, N., Alabi, D., Seltzer, M., Rudin, C. Learning Certifiably Optimal Rule Lists. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York; 2017, pp. 35-44. |
| [5] | Yoon, C. H., Torrance, R., Scheinerman, N. Machine Learning In Medicine: Should The Pursuit of Enhanced Interpretability be Abandoned? Journal of Medical Ethics. 2022, 48: 581-585. |
| [6] | Agarwal, C., Nguyen, A. Explaining Image Classifiers by Removing Input Features Using Generative Models. In ACCV. Springer, Cham; 2021, pp. 101-118. |
| [7] | Doshi-Velez, F., Kim, B. Towards A Rigorous Science Of Interpretable Machine Learning. Arxiv, Ithaca, New York, USA; 2017. |
| [8] | Miller, T. Explanation in Artificial Intelligence: Insights from the Social Sciences. Artificial Intelligence. 2019, 267: 1-38. |
| [9] |
International Organization for Standardization (ISO). ISO/IEC TR 24028: 2020 — Information Technology — Artificial Intelligence — Overview of Trustworthiness in AI. Available from:
https://www.iso.org/standard/77608.html (Accessed 10 July 2025). |
| [10] | Markus, A. F., Kors, J. A., Rijnbeek, P. R. The Role of Explainability In Creating Trustworthy Artificial Intelligence For Health Care: A Comprehensive Survey Of The Terminology, Design Choices, And Evaluation Strategies. Journal of Biomedical Informatics. 2021, 113: 103655. |
| [11] | Esna-Ashari, M. Beyond The Black Box: Review Of Quantitative Metrics For Neural Network Interpretability And Their Practical Implications. International Journal of Sustainable Applied Science and Engineering. 2025, 2: 1-24. |
| [12] | Yang, R. A., Jingyu, H., Zihao, L. C., Et Al. Interpretable Machine Learning for Weather and Climate Prediction: A Review. Atmospheric Environment. 2024, 120797. |
| [13] | Guangming, H., Yingya, L., Shoaib, J., Et Al. From Explainable To Interpretable Deep Learning For Natural Language Processing In Healthcare: How Far From Reality? Computational and Structural Biotechnology Journal. 2024. |
| [14] | Giacobbe, D. R., Marelli, C., Guastavino, S., Et Al. Explainable and Interpretable Machine Learning for Antimicrobial Stewardship: Opportunities and Challenges. Clinical Therapeutics. 2024. |
| [15] | Kumar, A., Dikshit, S., Albuquerque, V. Explainable Artificial Intelligence for Sarcasm Detection in Dialogues. Wireless Communications and Mobile Computing. 2021, 1: 1-13. |
| [16] | Ahmad, T., Katari, P., Kumar, A., Ravi, C., Shaik, M. Explainable AI: Interpreting Deep Learning Models for Decision Support. In Advances in Deep Learning Techniques. 2024, 4. |
| [17] | Johansson, U., Sönströd, C., Norinder, U., Boström, H. Trade-Off between Accuracy and Interpretability for Predictive In Silico Modeling. Future Medicinal Chemistry. 2011, 3: 647-663. |
| [18] | Ribeiro, M. T., Singh, S., Guestrin, C. Why Should I Trust you? Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, San Francisco; 2016, pp. 1135-1144. |
| [19] | Aldughay, B., Ashfaq, F., Jhanjhi, N. Z., Humayun, M. Explainable AI For Retinoblastoma Diagnosis: Interpreting Deep Learning Models with LIME And SHAP. Artificial Intelligence in Medical Images. 2023, 13: 1-13. |
| [20] | Caruana, R., Lou, Y., Gehrke, J., Koch, P., Sturm, M., Elhadad, N. Intelligible Models for Healthcare: Predicting Pneumonia Risk and Hospital 30-Day Readmission. In Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2015, Pp. 1721-1730. |
| [21] | Adebayo, R. P. Optimizing Food101 Classification with Transfer Learning: A Fine-Tuning Approach Using Efficientnetb0. International Journal of Intelligent Information Systems. 2024, 13: 59-77. |
| [22] |
Kanaung Academy. Understanding LSTM: A Deep Dive Into Its Inner Workings With Code Walkthrough. Available From:
https://Medium.Com/@Kanaung.Academy/Understanding-Lstm-A-Deep-Dive-Into-Its-Inner-Workings-With-Code-Walkthrough-5337 (Accessed 3 August 2025). |
| [23] |
D2L Team. Dive Into Deep Learning: Long Short Term Memory. Available From:
https://Classic.D2l.Ai/Chapter_Recurrent-Modern/Lstm.Html (Accessed 3 August 2025). |
| [24] |
Prasan, N. H. Activation Functions in Neural Networks. Medium. Available From:
https://Medium.Com/@Prasannh/Activation-Functions-In-Neural-Networks-B79a2608a106 (Accessed 3 August 2025). |
| [25] | Antonini, A. S., Tanzola, J., Asiain, L., Ferracutti, G. R., Castro, S. M., Bjerg, E. A., Ganuza, M. L. Machine Learning Model Interpretability Using SHAP Values: Application To Igneous Rock Classification Task. Applied Computing and Geosciences. 2024, 100178. |
| [26] | Parisineni, S. R. A., Pal, M. Enhancing Trust and Interpretability Of Complex Machine Learning Models Using Local Interpretable Model-Agnostic SHAP Explanations. International Journal of Data Science and Analytics. 2024, 18: 457-466. |
| [27] | Lipton, Z. C. The Mythos of Model Interpretability: In Machine Learning, The Concept of Interpretability is both Important and Slippery. Queue. 2018, 16: 31-57. |
APA Style
Adebayo, R. P. (2025). Bridging the Gap Between Accuracy and Interpretability: A Hybrid LSTM Approach with SHAP for ICU Mortality Prediction Using EHR Data. International Journal of Intelligent Information Systems, 14(6), 102-120. https://doi.org/10.11648/j.ijiis.20251406.11
ACS Style
Adebayo, R. P. Bridging the Gap Between Accuracy and Interpretability: A Hybrid LSTM Approach with SHAP for ICU Mortality Prediction Using EHR Data. Int. J. Intell. Inf. Syst. 2025, 14(6), 102-120. doi: 10.11648/j.ijiis.20251406.11
@article{10.11648/j.ijiis.20251406.11,
author = {Rotimi Philip Adebayo},
title = {Bridging the Gap Between Accuracy and Interpretability:
A Hybrid LSTM Approach with SHAP for ICU Mortality Prediction Using EHR Data},
journal = {International Journal of Intelligent Information Systems},
volume = {14},
number = {6},
pages = {102-120},
doi = {10.11648/j.ijiis.20251406.11},
url = {https://doi.org/10.11648/j.ijiis.20251406.11},
eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijiis.20251406.11},
abstract = {While deep learning models have achieved remarkable performance, their adoption in healthcare faces a critical challenge due to a lack of interpretability. Interpretability is a serious issue in high-stakes environments like Intensive Care Units (ICUs), where transparency in decision-making is not just mandatory but also essential. Several studies have ascertained the trade-off between performance and interpretability, with interpretability being sacrificed for performance and vice versa. Therefore, this study aims to demonstrate that performance and interpretability need not be mutually exclusive by proposing a hybrid framework that integrates LSTM, a deep learning architecture, with explainable models such as SHAP for ICU mortality prediction using Electronic Health Record (EHR) data. The study employs publicly available ICU datasets such as MIMIC-III or MIMIC-IV, which contain comprehensive EHR data for ICU patients. The LSTM achieved an accuracy of 98.6% and a recall of 87.5% on unseen data, but recorded low Precision, indicating that the model was biased toward the majority class (No Mortality). When the LSTM results were compared with baseline models (Random Forest and Logistic Regression), it generally outperformed the baseline models. The major limitation is the presence of class imbalance within the dataset, as shown by the precision. Despite this, the LSTM model successfully maintained interpretability through SHAP without compromising predictive performance, thereby achieving a balance between accuracy, transparency, and clinical relevance.},
year = {2025}
}
TY - JOUR T1 - Bridging the Gap Between Accuracy and Interpretability: A Hybrid LSTM Approach with SHAP for ICU Mortality Prediction Using EHR Data AU - Rotimi Philip Adebayo Y1 - 2025/12/29 PY - 2025 N1 - https://doi.org/10.11648/j.ijiis.20251406.11 DO - 10.11648/j.ijiis.20251406.11 T2 - International Journal of Intelligent Information Systems JF - International Journal of Intelligent Information Systems JO - International Journal of Intelligent Information Systems SP - 102 EP - 120 PB - Science Publishing Group SN - 2328-7683 UR - https://doi.org/10.11648/j.ijiis.20251406.11 AB - While deep learning models have achieved remarkable performance, their adoption in healthcare faces a critical challenge due to a lack of interpretability. Interpretability is a serious issue in high-stakes environments like Intensive Care Units (ICUs), where transparency in decision-making is not just mandatory but also essential. Several studies have ascertained the trade-off between performance and interpretability, with interpretability being sacrificed for performance and vice versa. Therefore, this study aims to demonstrate that performance and interpretability need not be mutually exclusive by proposing a hybrid framework that integrates LSTM, a deep learning architecture, with explainable models such as SHAP for ICU mortality prediction using Electronic Health Record (EHR) data. The study employs publicly available ICU datasets such as MIMIC-III or MIMIC-IV, which contain comprehensive EHR data for ICU patients. The LSTM achieved an accuracy of 98.6% and a recall of 87.5% on unseen data, but recorded low Precision, indicating that the model was biased toward the majority class (No Mortality). When the LSTM results were compared with baseline models (Random Forest and Logistic Regression), it generally outperformed the baseline models. The major limitation is the presence of class imbalance within the dataset, as shown by the precision. Despite this, the LSTM model successfully maintained interpretability through SHAP without compromising predictive performance, thereby achieving a balance between accuracy, transparency, and clinical relevance. VL - 14 IS - 6 ER -