Cluster Analysis, K-Nearest Neighbour and Artificial Neural Network Applied to Credit Data to Classify Credit Applicants
American Journal of Theoretical and Applied Statistics
Volume 5, Issue 4, July 2016, Pages: 186-191
Received: May 5, 2016; Accepted: May 18, 2016; Published: Jun. 7, 2016
Views 4213      Downloads 130
Authors
Mutua Jennifer Ndanu, Applied Statistics, Department of Statistics and Actuarial Science, Jomo Kenyatta University of Agriculture and Technology, Nairobi, Kenya
Gichuhi Anthony Waititu, Statistics, Department of Statistics and Actuarial Science, Jomo Kenyatta University of Agriculture and Technology, Nairobi, Kenya
Wanjoya Anthony Kiberia, Statistics, Department of Statistics and Actuarial Science, Jomo Kenyatta University of Agriculture and Technology, Nairobi, Kenya
Muia Patricia Nthoki, Education, Department of Educational, Administration and Planning, University of Nairobi, Nairobi, Kenya
Article Tools
Follow on us
Abstract
Potential risk on credit applicants is the probability of default on repayment of a credit facility rendered by a commercial bank. To improve efficiency in decision making on credit risk, therefore credit scoring models are developed. The objectives of this research areto classify credit applicants cluster analysis, Artificial Neural Network and K-Nearest neighbours techniques and to compare their predictive accuracy. The analysis was first by training the dataset, where by 70% of the data was used for training and the remaining 30% was used for testing. Finally, the ability of the developed models to forecast trends was investigated. Here we assume that a cluster is homogeneous, if it contains members that have a high degree of similarity. The analysis is therefore based on credit data provided by commercial banks in Kenya used to test the effectiveness of cluster analysis, K-Nearest neighbour (K-NN) and artificial neural network (ANN) models. To determine the best model in classification accuracy, confusion matrix was used. To test for the goodness of fit the chi square test was used. From the results of the study, the researcher concluded that ANN was better in predicting the classification of credit applicants than K-NN and Cluster Analysis.
Keywords
Cluster Analysis, ANN: Artificial Neural Network, K-NN: K-Nearest Neighbour, Credit Risk, Overall Accuracy Rate, SSE: Sum of Square Errors
To cite this article
Mutua Jennifer Ndanu, Gichuhi Anthony Waititu, Wanjoya Anthony Kiberia, Muia Patricia Nthoki, Cluster Analysis, K-Nearest Neighbour and Artificial Neural Network Applied to Credit Data to Classify Credit Applicants, American Journal of Theoretical and Applied Statistics. Vol. 5, No. 4, 2016, pp. 186-191. doi: 10.11648/j.ajtas.20160504.14
Copyright
Copyright © 2016 Authors retain the copyright of this article.
This article is an open access article distributed under the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
References
[1]
Abdou, H, J Pointon and A El-Masry (2007), ‘On the applicability of credit scoringmodels in Egyptian banks’, Banks Bank Syst 2 (1), 4–19.
[2]
Bekhet, H and S Eletter (2012), ‘Credit risk management for the Jordanian commercial banks: a business intelligence approach’, Aust. J. Basic Appl. Sci 6 (18), 188–195.
[3]
Boguslauskas, V and R Mileris (2009), ‘Corporate distress diagnosis: Comparisons using linear discriminant analysis and neural networks (the italian experience)’, Economics of engineering decisions.
[4]
Correa, A, A Gonzalez, C Nieto and D Amezquita (2012), Constructing a Credit Risk Scorecard using Predictive Clusters, SAS Global Forum.
[5]
Durand, D (1941), Risk elements in consumer instalments financing, New York: national bureau of economic research.
[6]
Enas, G G and S C Choi (1986), ‘Choice of the smoothing parameter and efficiency of k-nearest Neighbor classification’, Computers and Mathematics with Applications 12A (2), 235–244.
[7]
Fisher, R A (1936), ‘The use of multiple measurement in taxonomic problems’, Annals ofEugenic 7, 179–188.
[8]
Fix, E and J Hodges (1952), Discrimatory analysis; nonparametric discrimination: consistency properties, report 4, project 21-49-004 edn, us airforce school of aviation medicine, random Field.
[9]
Glorfeld, LWand B C Hardgrave (1996), ‘an improved method for developing neural networks: the case of evaluating commercial loan credit worthiness’, Computers and Operations Research 23 (10), 933–944.
[10]
Hand, D J and W E Henley (1996), ‘A k-nearest neighbour classifier for assessing consumer credit risk’, the statistician 45 (1), 77–95.
[11]
Khashman, A (2010), ‘Neural network for credit risk evaluation: investigation of different neural Models and learning schemes.)’, Exp. Syst. Appl. 37 (9), 6233–6239.
[12]
Oso, W Y and D Onen (2009), ‘A guide line to writing a research proposal and report’, A Handbook of Beginning Researchers.
ADDRESS
Science Publishing Group
1 Rockefeller Plaza,
10th and 11th Floors,
New York, NY 10020
U.S.A.
Tel: (001)347-983-5186