Cluster Analysis, K-Nearest Neighbour and Artificial Neural Network Applied to Credit Data to Classify Credit Applicants

Mutua Jennifer Ndanu; Gichuhi Anthony Waititu; Wanjoya Anthony Kiberia; Muia Patricia Nthoki

doi:doi:10.11648/j.ajtas.20160504.14

| Peer-Reviewed

Cluster Analysis, K-Nearest Neighbour and Artificial Neural Network Applied to Credit Data to Classify Credit Applicants

Mutua Jennifer Ndanu, Gichuhi Anthony Waititu, Wanjoya Anthony Kiberia, Muia Patricia Nthoki

Published in American Journal of Theoretical and Applied Statistics (Volume 5, Issue 4)

Received: 5 May 2016 Accepted: 18 May 2016 Published: 7 June 2016

Views: Downloads:

Download PDF

Share This Article

Twitter
Linked In
Facebook

Abstract

Potential risk on credit applicants is the probability of default on repayment of a credit facility rendered by a commercial bank. To improve efficiency in decision making on credit risk, therefore credit scoring models are developed. The objectives of this research areto classify credit applicants cluster analysis, Artificial Neural Network and K-Nearest neighbours techniques and to compare their predictive accuracy. The analysis was first by training the dataset, where by 70% of the data was used for training and the remaining 30% was used for testing. Finally, the ability of the developed models to forecast trends was investigated. Here we assume that a cluster is homogeneous, if it contains members that have a high degree of similarity. The analysis is therefore based on credit data provided by commercial banks in Kenya used to test the effectiveness of cluster analysis, K-Nearest neighbour (K-NN) and artificial neural network (ANN) models. To determine the best model in classification accuracy, confusion matrix was used. To test for the goodness of fit the chi square test was used. From the results of the study, the researcher concluded that ANN was better in predicting the classification of credit applicants than K-NN and Cluster Analysis.

Published in	American Journal of Theoretical and Applied Statistics (Volume 5, Issue 4)
DOI	10.11648/j.ajtas.20160504.14
Page(s)	186-191
Creative Commons	This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.
Copyright	Copyright © The Author(s), 2016. Published by Science Publishing Group

Keywords

Cluster Analysis, ANN: Artificial Neural Network, K-NN: K-Nearest Neighbour, Credit Risk, Overall Accuracy Rate, SSE: Sum of Square Errors

References

[1]	Abdou, H, J Pointon and A El-Masry (2007), ‘On the applicability of credit scoringmodels in Egyptian banks’, Banks Bank Syst 2 (1), 4–19.
[2]	Bekhet, H and S Eletter (2012), ‘Credit risk management for the Jordanian commercial banks: a business intelligence approach’, Aust. J. Basic Appl. Sci 6 (18), 188–195.
[3]	Boguslauskas, V and R Mileris (2009), ‘Corporate distress diagnosis: Comparisons using linear discriminant analysis and neural networks (the italian experience)’, Economics of engineering decisions.
[4]	Correa, A, A Gonzalez, C Nieto and D Amezquita (2012), Constructing a Credit Risk Scorecard using Predictive Clusters, SAS Global Forum.
[5]	Durand, D (1941), Risk elements in consumer instalments financing, New York: national bureau of economic research.
[6]	Enas, G G and S C Choi (1986), ‘Choice of the smoothing parameter and efficiency of k-nearest Neighbor classification’, Computers and Mathematics with Applications 12A (2), 235–244.
[7]	Fisher, R A (1936), ‘The use of multiple measurement in taxonomic problems’, Annals ofEugenic 7, 179–188.
[8]	Fix, E and J Hodges (1952), Discrimatory analysis; nonparametric discrimination: consistency properties, report 4, project 21-49-004 edn, us airforce school of aviation medicine, random Field.
[9]	Glorfeld, LWand B C Hardgrave (1996), ‘an improved method for developing neural networks: the case of evaluating commercial loan credit worthiness’, Computers and Operations Research 23 (10), 933–944.
[10]	Hand, D J and W E Henley (1996), ‘A k-nearest neighbour classifier for assessing consumer credit risk’, the statistician 45 (1), 77–95.
[11]	Khashman, A (2010), ‘Neural network for credit risk evaluation: investigation of different neural Models and learning schemes.)’, Exp. Syst. Appl. 37 (9), 6233–6239.
[12]	Oso, W Y and D Onen (2009), ‘A guide line to writing a research proposal and report’, A Handbook of Beginning Researchers.

Cite This Article

Plain Text BibTeX RIS

APA Style

Mutua Jennifer Ndanu, Gichuhi Anthony Waititu, Wanjoya Anthony Kiberia, Muia Patricia Nthoki. (2016). Cluster Analysis, K-Nearest Neighbour and Artificial Neural Network Applied to Credit Data to Classify Credit Applicants. American Journal of Theoretical and Applied Statistics, 5(4), 186-191. https://doi.org/10.11648/j.ajtas.20160504.14

Copy | Download

ACS Style

Mutua Jennifer Ndanu; Gichuhi Anthony Waititu; Wanjoya Anthony Kiberia; Muia Patricia Nthoki. Cluster Analysis, K-Nearest Neighbour and Artificial Neural Network Applied to Credit Data to Classify Credit Applicants. Am. J. Theor. Appl. Stat. 2016, 5(4), 186-191. doi: 10.11648/j.ajtas.20160504.14

Copy | Download

AMA Style

Mutua Jennifer Ndanu, Gichuhi Anthony Waititu, Wanjoya Anthony Kiberia, Muia Patricia Nthoki. Cluster Analysis, K-Nearest Neighbour and Artificial Neural Network Applied to Credit Data to Classify Credit Applicants. Am J Theor Appl Stat. 2016;5(4):186-191. doi: 10.11648/j.ajtas.20160504.14

Copy | Download

@article{10.11648/j.ajtas.20160504.14,
  author = {Mutua Jennifer Ndanu and Gichuhi Anthony Waititu and Wanjoya Anthony Kiberia and Muia Patricia Nthoki},
  title = {Cluster Analysis, K-Nearest Neighbour and Artificial Neural Network Applied to Credit Data to Classify Credit Applicants},
  journal = {American Journal of Theoretical and Applied Statistics},
  volume = {5},
  number = {4},
  pages = {186-191},
  doi = {10.11648/j.ajtas.20160504.14},
  url = {https://doi.org/10.11648/j.ajtas.20160504.14},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajtas.20160504.14},
  abstract = {Potential risk on credit applicants is the probability of default on repayment of a credit facility rendered by a commercial bank. To improve efficiency in decision making on credit risk, therefore credit scoring models are developed. The objectives of this research areto classify credit applicants cluster analysis, Artificial Neural Network and K-Nearest neighbours techniques and to compare their predictive accuracy. The analysis was first by training the dataset, where by 70% of the data was used for training and the remaining 30% was used for testing. Finally, the ability of the developed models to forecast trends was investigated. Here we assume that a cluster is homogeneous, if it contains members that have a high degree of similarity. The analysis is therefore based on credit data provided by commercial banks in Kenya used to test the effectiveness of cluster analysis, K-Nearest neighbour (K-NN) and artificial neural network (ANN) models. To determine the best model in classification accuracy, confusion matrix was used. To test for the goodness of fit the chi square test was used. From the results of the study, the researcher concluded that ANN was better in predicting the classification of credit applicants than K-NN and Cluster Analysis.},
 year = {2016}
}

Copy | Download

TY - JOUR
T1 - Cluster Analysis, K-Nearest Neighbour and Artificial Neural Network Applied to Credit Data to Classify Credit Applicants
AU - Mutua Jennifer Ndanu
AU - Gichuhi Anthony Waititu
AU - Wanjoya Anthony Kiberia
AU - Muia Patricia Nthoki
Y1 - 2016/06/07
PY - 2016
N1 - https://doi.org/10.11648/j.ajtas.20160504.14
DO - 10.11648/j.ajtas.20160504.14
T2 - American Journal of Theoretical and Applied Statistics
JF - American Journal of Theoretical and Applied Statistics
JO - American Journal of Theoretical and Applied Statistics
SP - 186
EP - 191
PB - Science Publishing Group
SN - 2326-9006
UR - https://doi.org/10.11648/j.ajtas.20160504.14
AB - Potential risk on credit applicants is the probability of default on repayment of a credit facility rendered by a commercial bank. To improve efficiency in decision making on credit risk, therefore credit scoring models are developed. The objectives of this research areto classify credit applicants cluster analysis, Artificial Neural Network and K-Nearest neighbours techniques and to compare their predictive accuracy. The analysis was first by training the dataset, where by 70% of the data was used for training and the remaining 30% was used for testing. Finally, the ability of the developed models to forecast trends was investigated. Here we assume that a cluster is homogeneous, if it contains members that have a high degree of similarity. The analysis is therefore based on credit data provided by commercial banks in Kenya used to test the effectiveness of cluster analysis, K-Nearest neighbour (K-NN) and artificial neural network (ANN) models. To determine the best model in classification accuracy, confusion matrix was used. To test for the goodness of fit the chi square test was used. From the results of the study, the researcher concluded that ANN was better in predicting the classification of credit applicants than K-NN and Cluster Analysis.
VL - 5
IS - 4
ER -

Copy | Download

Author Information

Mutua Jennifer Ndanu

Applied Statistics, Department of Statistics and Actuarial Science, Jomo Kenyatta University of Agriculture and Technology, Nairobi, Kenya
Gichuhi Anthony Waititu

Statistics, Department of Statistics and Actuarial Science, Jomo Kenyatta University of Agriculture and Technology, Nairobi, Kenya
Wanjoya Anthony Kiberia

Statistics, Department of Statistics and Actuarial Science, Jomo Kenyatta University of Agriculture and Technology, Nairobi, Kenya
Muia Patricia Nthoki

Education, Department of Educational, Administration and Planning, University of Nairobi, Nairobi, Kenya

Download PDF

Submit an Article

Sections

Plain Text BibTeX RIS

APA Style

Mutua Jennifer Ndanu, Gichuhi Anthony Waititu, Wanjoya Anthony Kiberia, Muia Patricia Nthoki. (2016). Cluster Analysis, K-Nearest Neighbour and Artificial Neural Network Applied to Credit Data to Classify Credit Applicants. American Journal of Theoretical and Applied Statistics, 5(4), 186-191. https://doi.org/10.11648/j.ajtas.20160504.14

Copy | Download

ACS Style

Mutua Jennifer Ndanu; Gichuhi Anthony Waititu; Wanjoya Anthony Kiberia; Muia Patricia Nthoki. Cluster Analysis, K-Nearest Neighbour and Artificial Neural Network Applied to Credit Data to Classify Credit Applicants. Am. J. Theor. Appl. Stat. 2016, 5(4), 186-191. doi: 10.11648/j.ajtas.20160504.14

Copy | Download

AMA Style

Mutua Jennifer Ndanu, Gichuhi Anthony Waititu, Wanjoya Anthony Kiberia, Muia Patricia Nthoki. Cluster Analysis, K-Nearest Neighbour and Artificial Neural Network Applied to Credit Data to Classify Credit Applicants. Am J Theor Appl Stat. 2016;5(4):186-191. doi: 10.11648/j.ajtas.20160504.14

Copy | Download

@article{10.11648/j.ajtas.20160504.14,
  author = {Mutua Jennifer Ndanu and Gichuhi Anthony Waititu and Wanjoya Anthony Kiberia and Muia Patricia Nthoki},
  title = {Cluster Analysis, K-Nearest Neighbour and Artificial Neural Network Applied to Credit Data to Classify Credit Applicants},
  journal = {American Journal of Theoretical and Applied Statistics},
  volume = {5},
  number = {4},
  pages = {186-191},
  doi = {10.11648/j.ajtas.20160504.14},
  url = {https://doi.org/10.11648/j.ajtas.20160504.14},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajtas.20160504.14},
  abstract = {Potential risk on credit applicants is the probability of default on repayment of a credit facility rendered by a commercial bank. To improve efficiency in decision making on credit risk, therefore credit scoring models are developed. The objectives of this research areto classify credit applicants cluster analysis, Artificial Neural Network and K-Nearest neighbours techniques and to compare their predictive accuracy. The analysis was first by training the dataset, where by 70% of the data was used for training and the remaining 30% was used for testing. Finally, the ability of the developed models to forecast trends was investigated. Here we assume that a cluster is homogeneous, if it contains members that have a high degree of similarity. The analysis is therefore based on credit data provided by commercial banks in Kenya used to test the effectiveness of cluster analysis, K-Nearest neighbour (K-NN) and artificial neural network (ANN) models. To determine the best model in classification accuracy, confusion matrix was used. To test for the goodness of fit the chi square test was used. From the results of the study, the researcher concluded that ANN was better in predicting the classification of credit applicants than K-NN and Cluster Analysis.},
 year = {2016}
}

Copy | Download