American Journal of Data Mining and Knowledge Discovery

Submit a Manuscript

Publishing with us to make your research visible to the widest possible audience.

Propose a Special Issue

Building a community of authors and readers to discuss the latest research and develop new ideas.

Deep Learning for Sentiment Analysis to Predict the Probability of Bank Loan Default

Social networks have taken the world by storm with their fast and commendable speed. It could be social, political, or present with all sorts of situations that arise. People’s opinions around the globe are articulated through social media, making it apposite for drawing out opinions. Organizations that aim at refining their products and services use sentimental analysis methods to increase their resources. In the banking and financial industry, it is much easier to get feedback from customers through Twitter and or Facebook sentimental analysis. The elements associated with Twitter or consumers and services providers who want to know who they are, and what they are in their daily life towards their bank and financial portfolios cannot suppress Facebook sentimental analysis. Hence, this study aims to predict the probability of bank loan default and classify the Twitter messages by exhibiting the results of deep learning algorithms. High-performance computing with hyper-parameter space for grid-search (HPSGS) and hyper-parameter optimization (HPO) are developed and compared with the effectiveness of three gradient boosting decision trees. The results reveal that the XGboot algorithm has a better prediction or features a score that is better as compared to other algorithms at 91 percent in the test data and 93 percent performance in the validation data. It is also seen that women are more likely to default than men as across all the algorithms, their likelihood of risk or default is higher than that of men. These results are useful for decision-makers and the financial sector for future use and planning in credit risk and bank loan default-prone areas.

Bank Loan, Credit Risk, Data Mining, Deep Learning, Machine Learning, Sentiment Analysis

APA Style

Katleho Makatjane. (2023). Deep Learning for Sentiment Analysis to Predict the Probability of Bank Loan Default. American Journal of Data Mining and Knowledge Discovery, 7(2), 5-12.

ACS Style

Katleho Makatjane. Deep Learning for Sentiment Analysis to Predict the Probability of Bank Loan Default. Am. J. Data Min. Knowl. Discov. 2023, 7(2), 5-12. doi: 10.11648/j.ajdmkd.20220702.11

AMA Style

Katleho Makatjane. Deep Learning for Sentiment Analysis to Predict the Probability of Bank Loan Default. Am J Data Min Knowl Discov. 2023;7(2):5-12. doi: 10.11648/j.ajdmkd.20220702.11

Copyright © 2022 Authors retain the copyright of this article.
This article is an open access article distributed under the Creative Commons Attribution License ( which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1. Perera, H. and Premaratne S (2016), An artificial neural network approach for the predictive accuracy of payments of leasing customers in Sri Lanka. H. A. P. L. Perera/BESSH-2016/Full Paper Proceeding. 285 (2): 1-11.
2. Abellán, J. and Castellano J. G. (2017). A comparative study on base classifiers in ensemble methods for credit scoring. Expert systems with applications 73: 1-10.
3. Adewusi, A. O., Oyedokun, T. B., and Bello, M. O. (2016). Application of artificial neural network to loan recovery prediction. International Journal of Housing Markets and Analysis. 9 (2): 222-238.
4. Netzer, O., A. Lemaire, and Herzenstein M. J. J (2019). When words sweat: Identifying signals for loan default in the text of loan applications. Journal of Marketing Research, 56 (6): 960-980.
5. Aslam, U., Tariq Aziz, H. I., Sohail, A., and Batcha, N. K. (2019). An empirical study on loan default prediction models. Journal of Computational and Theoretical Nanoscience, 16 (8): 3483-3488.
6. Jiang, C., Wang, Z., Wang, R., and Ding, Y. (2018). Loan default prediction by combining soft information extracted from descriptive text in online peer-to-peer lending. Annals of Operations Research, 266 (1): 511-529.
7. Tanoue, Y., Kawada, A., and Yamashita, S. (2017). Forecasting loss given default of bank loans with multi-stage model. International Journal of Forecasting, 33 (2): 513-522.
8. Song, Y., and Peng, Y. (2019). A MCDM-based evaluation approach for imbalanced classification methods in financial risk prediction. IEEE Access, 7: 84897-84906.
9. Altman, E. I. (1984). The success of business failure prediction models: An international survey. Journal of Banking and Finance, 8 (2), 171-198.
10. Kumar, P. R., and Ravi, V. (2007). Bankruptcy prediction in banks and firms via statistical and intelligent techniques–A review. European journal of operational research, 180 (1): 1-28.
11. Verikas, A., Kalsyte, Z., Bacauskiene, M., and Gelzinis, A. (2010). Hybrid and ensemble-based soft computing techniques in bankruptcy prediction: a survey. Soft Computing, 14 (9): 995-1010.
12. Makatjane, K., N. Moroke, and B. Ncube (2020) Detecting Financial Fraud in South Africa: A Comparison of Logistic Model Tree and Gradient Boosting Decision Tree. In Proceedings of 36th International business information management association, 1331-1340.
13. Naidu, G. P. and K. Govinda. (2018) Bankruptcy prediction using neural networks. in 2018 2nd International Conference on Inventive Systems and Control (ICISC. 248-251 IEEE.
14. Aziz, S. and M. Dowling (2019). Machine learning and AI for risk management. In Disrupting finance, 33-50. Palgrave Pivot, Cham.
15. Pumsirirat, A., and Liu, Y. (2018). Credit card fraud detection using deep learning based on auto-encoder and restricted Boltzmann machine. International Journal of advanced computer science and applications, 9 (1): 18-25.
16. Touzani, S., Granderson, J., and Fernandes, S. (2018). Gradient boosting machine for modeling the energy consumption of commercial buildings. Energy and Buildings, 158: 1533-1543.
17. Son, J., Jung, I., Park, K., and Han, B. (2015). Tracking-by-segmentation with online gradient boosting decision tree. In Proceedings of the IEEE international conference on computer vision, 3056-3064.
18. Liu, B. (2020). Sentiment analysis: mining opinions, sentiments, and emotions. Cambridge university press.
19. Al-Shabi, M. A. (2020). Evaluating the performance of the most important Lexicons used to Sentiment analysis and opinions Mining. IJCSNS, 20 (1): 1.
20. Kumar, R. S., Saviour Devaraj, A. F., Rajeswari, M., Julie, E. G., Robinson, Y. H., and Shanmuganathan, V. (2022). Exploration of sentiment analysis and legitimate artistry for opinion mining. Multimedia Tools and Applications, 81 (9): 11989-12004.
21. Aquino, P. A., López, V. F., Moreno, M. N., Muñoz, M. D., and Rodríguez, S. (2020). Opinion mining system for Twitter sentiment analysis. in International Conference on Hybrid Artificial Intelligence Systems. 465-476 Springer Cham.
22. Van Looy, A. (2022). Sentiment analysis and opinion mining (business intelligence 1). In Social Media Management. 147-163. Springer Cham.
23. Anghel, A., Papandreou, N., Parnell, T., De Palma, A., and Pozidis, H. (2018). Benchmarking and optimization of gradient boosting decision tree algorithms. arXiv preprint arXiv: 1809.04559.
24. Hancock, J. T., and Khoshgoftaar, T. M. (2020). CatBoost for big data: an interdisciplinary review. Journal of big data, 7 (1): 1-45.
25. Dixit, P., and Silakari, S. (2021). Deep learning algorithms for cybersecurity applications: A technological and status review. Computer Science Review, 39: 100317.
26. Magnusson M, Vehtari A, Jonasson J. and Andersen M, (2020). Leave-one-out cross-validation for Bayesian model comparison in large data. In International Conference on Artificial Intelligence and Statistics. 341-351. PMLR.
27. Kaplan, D. (2021). On the Quantification of Model Uncertainty: A Bayesian Perspective. Psychometrika, 86, 215–238.
28. Bürkner P. C, Gabry J, and Vehtari A, (2020). Approximate leave-future-out cross-validation for Bayesian time series models. Journal of Statistical Computation and Simulation, 90 (14), 2499-2523.
29. James, G., Witten, D., Hastie, T. and Tibshirani, R., (2013). An Introduction to Statistical Learning 112, 18. New York Springer.
30. Ma, Xiaomeng, and Shuliang Lv. (2019): Financial credit risk prediction in internet finance driven by machine learning." Neural Computing and Applications 31 (12): 8359-8367.
31. Reynal-Querol, M., and Montalvo, J. G. (2020). Gender and Credit Risk: A View From The Loan Officer's Desk. Barcelona GSE Working Paper Series Working Paper no 1076.
32. Eckel, C. C., and F¨ullbrunn, S. C. (2017). Hidden vs. known gender effects in experimental asset markets. Economics Letters, 156: 7-9.