Need a perfect paper? Place your first order and save 5% with this code:   SAVE5NOW

Creditworthiness of Customers

INTRODUCTION

This study seeks to investigate the creditworthiness of customers in Taiwan, as one of the primary goals of a lending institution is to keenly assess and manage both the customer and the credit by ensuring that the credit is available, evaluating the related risks, optimizing the returns, and most importantly addressing customer’s payment defaults. The majority of lending institutions have aimed to ensure that they have full control of the risks associated with providing loans; here, associating independent third-party evaluations to assess the creditworthiness of clients has helped mitigate the risks related to credit defaults. Evaluating the creditworthiness of an individual is of utmost importance for banks and financial organizations. The complexity of assessing credit risk depends on a number of variables, such as an individual’s financial situation, the state of the economy, and anticipated market volatility. Traditional approaches to assessing credit risk have been employed for an extended period. Still, they may not be well equipped to handle the vast and intricate data that is accessible in the present digital age. This project aims to create and validate a credit risk assessment mechanism using the Support Vector Machine. This learning technique improves banks’ ability to detect credit-related difficulties by effectively utilizing the ample data accessible to individuals. The suggested methodology seeks to leverage machine learning techniques to provide a comprehensive assessment of individuals’ creditworthiness. The objective of this method is to reduce the probability of loan defaults by empowering banks to make informed and tactical lending choices.

Furthermore, the possible implementation of such a credit risk assessment system could provide multiple benefits to financial institutions. The installation of this system is expected to result in enhanced credit risk evaluation, thereby reducing the bank’s non-performing loan ratio and increasing overall profit margins. The analysis will be conducted in SAS enterprise miner, where the results will be presented in both table format and graphs. The dataset for this research was retrieved via the link Default of Credit Card Clients Dataset (kaggle.com). It consisted of 25 features with 30000 observations, of which there was a single binary target, nine nominal inputs, and 14 interval inputs. The descriptive statistics for the continuous variables were as follows.

Table 1 Descriptive Statistics Summary

Variable Mean Deviation Missing Missing Minimum Median Maximum Skewness Kurtosis
AGE 35.4855 9.217904 30000 0 21 34 79 0.732246 0.044303
BILL_AMT1 51223.33 73635.86 30000 0 -165580 22381 964511 2.663861 9.806289
BILL_AMT2 49179.08 71173.77 30000 0 -69777 21197 983931 2.705221 10.30295
BILL_AMT3  47013.15 69349.39 30000 0 -157264 20088 1664089 3.08783 19.78326
BILL_AMT4 43262.95 64332.86 30000 0 -170000 19052 891586 2.821965 11.30932
BILL_AMT5 40311.4 60797.16 30000 0 -81334 18104 927171 2.87638 12.30588
BILL_AMT6 38871.76 59554.11 30000 0 -339603 17068 961664 2.846645 12.27071
LIMIT_BAL 167484.3 129747.7 30000 0 10000 140000 1000000 0.992867 0.536263
PAY_AMT1  5663.581 16563.28 30000 0 0 2100 873552 14.66836 415.2547
PAY_AMT2  5921.164 23040.87 30000 0 0 2009 1684259 30.45382 1641.632
PAY_AMT3  5225.682 17606.96 30000 0 0 1800 896040 17.21664 564.3112
PAY_AMT4  4826.077 15666.16 30000 0 0 1500 621000 12.90498 277.3338
PAY_AMT5  4799.388 15278.31 30000 0 0 1500 426529 11.12742 180.0639
PAY_AMT6  5215.503 17777.47 30000 0 0 1500 528666 10.64073 167.1614

The default to payment next month was our binary target variable. From the dataset, we can see that 77.88% of the customers are not expected to default payment, whereas 22.12% of the customers were expected to default payment. This shows a high imbalance on the target variable, which, once we determine the best SVM model, will assign a 22.12% cutoff, denoting the true positives.

Bar chart of the target variable

Figure 1 Bar chart of the target variable

A chi-square test was conducted to determine whether there was an association between the target variable and the input nominal variable. Where based on the results p-value, we see that the association is statistically significant as the p-values are lesser than the assumed alpha 0.05

Table 2 Chi-square summary

Input Chi-Square Degrees of freedom Prob
PAY_0 5365.9650 10 <.0001
PAY_2 3474.4668 10 <.0001
PAY_3 2622.4621 10 <.0001
PAY_4 2341.4699 10 <.0001
PAY_5 2197.6949 9 <.0001
PAY_6 1886.8353 9 <.0001
EDUCATION 163.2166 6 <.0001
SEX 47.9054 1 <.0001
MARRIAGE 35.6624 3 <.0001

Data Cleaning and Preparation

Support vector machine(SVM) was implemented on the binary target variable default payment for the next month where 0 implied the customer didn’t make any default, whereas 1 is the case where the client actually made a default in payment for the next month. Here, we see that the SVM performance was better with a large number of features and a small sample size, hence the need for data partitioning. The first preparation conducted was to split the data into a stratified training, validation, and testing set. Putting in mind that the target variable was imbalanced, we used a split ratio of 4:3:3 with 40% of the data being allocated to the training set, that is, 12000 observations and 30% of the data being given to both the validation set and testing set being 9000 observations each. Here, the goal was to minimize the data sample as much as we could. The next step was to transform the continuous features by standardizing them so as to assume normality (zero skew and zero kurtosis) and also to place them on a relative scale. Conversion of all the features to numeric variables was also implemented. We then proceeded to check for any missing data present, and we came to the conclusion that there was no missing data current, hence no need to input the data.

Predictive models Developed

This section will look at the results generated for the different SVM models, which varied based on kernels which were the linear kernel and the polynomial kernel both for interior point settings, and the polynomial kernel, the sigmoidal kernel, and the Radial basis kernel for active point setting. Here, we developed two scenarios to test the performances of each of the models. Scenario one was without data preparation, and the second scenario was with data preparation, whose results were as follows.

Table 3 Model summary without preparation

Model Description Validation Misclassification Rate  Train Average Squared Error  Train Misclassification Rate  Valid Average Squared Error
HP SVM Polynomial Interior 0.18133 0.16269 0.17926 0.16484
HP SVM Polynomial 4 active 0.18156 0.20465 0.17326 0.20708
HP SVM Radial Active 0.18700 0.14835 0.18852 0.14858
HP SVM Linear Interior 0.19056 0.15511 0.19285 0.15487
HP SVM Sigmoid 1 0.29467 0.20380 0.29044 0.20691
HP SVM Sigmoid 2 0.30056 0.21052 0.30144 0.21322

Here, we will look to monitor the model’s performance based on the misclassification rate where the model chosen in this case is the hyper-tuned SVM polynomial interior with a polynomial degree of three, which had a validation misclassification rate of 0.18133. We see that only 18.13% of the validation observations were incorrectly predicted, but this needs to be more accurate as the target variable is heavily imbalanced, hence the introduction of the cutoff node to the best model.

Table 4 Model summary with preparation

Model Description Validation Misclassification Rate  Train Average Squared Error Train Misclassification Rate Valid Average Squared Error
HP SVM interior point-polynomial 0.18311 0.18250 0.17135 0.18727
HP SVM Active set Radial 0.19167 0.14421 0.18660 0.14740
HP SVM active polynomial p=4 0.19344 0.19605 0.14310 0.20595
HP SVM interior point -linear 0.20056 0.15400 0.19910 0.15448
HP SVM Active Sigmoid 1 0.22056 0.18362 0.22152 0.18452
HP SVM Sigmoid 2 0.22122 0.22106 0.22110 0.22118

Similar to the model summary above, we also check on the misclassification rate for the SVM model with data preparation implemented. Here, we see that the hyper-tuned SVM polynomial interior with a polynomial degree of three was also the model with the least misclassification rate of 0.18311.

Results

In this section will look at the accuracies for each model in each scenario with their respective confusion matrix for the best models. The results were as follows

Model Description Training accuracies without preparation Training accuracies with preparation
HP SVM interior point-polynomial 0.8207 0.8287
HP SVM Active set Radial 0.8115 0.8134
HP SVM active polynomial p=4 0.8267 0.8569
HP SVM interior point -linear 0.8072 0.8009
HP SVM Active Sigmoid 1 0.7096 0.7785
HP SVM Sigmoid 2 0.6986 0.7789

Here, we see, based on the model summary for each scenario, that the hyper-tuned SVM active polynomial with a polynomial degree of 4 had the highest accuracy of 0.8569, that’s 85.69% accurate predictions.

table of statistics

Conclusion

In conclusion, based on the model result, we see that the best SVM model was the HP SVM active polynomial p=4 as it had the highest accuracy score of 0.8569, which was the fraction of correct predictions and a misclassification score of 0.19344. We also see, based on the classification report, that 81.24% were classified as 1:1, with 83.45% being classified as 0:0. This shows a better balance in the target variable.

Reference

  1. Caplinska, A., & Tvaronavičienė, M. (2020). Creditworthiness is placed in credit theory and its methods of evaluation. Entrepreneurship and sustainability issues7(3), 2542.
  2. Shen, G., & Du, Y. (2022, March). Research on the Construction of a Personal Credit Risk Assessment Index System based on PCA. In CIBDA 2022; 3rd International Conference on Computer Information and Big Data Applications(pp. 1-4). VDE.
  3. Tao, W. (2010, May). Evaluation and Construction of Individual credit evaluation system based on a third-party e-commerce transaction platform. In 2010 International Conference on E-Business and E-Government(pp. 283-286). IEEE.

Appendix

Table 5 Data description

Feature Description
ID  The ID of each client
LIMIT_BAL  Amount of given credit in NT dollars (includes individual and family/supplementary credit
SEX  Gender (1=male, 2=female)
EDUCATION  (1=graduate school, 2=university, 3=high school, 4=others, 5=unknown, 6=unknown)
MARRIAGE  Marital status (1=married, 2=single, 3=others)
AGE  Age in years
PAY_0  Repayment status in September, 2005 (-1=pay duly, 1=payment delay for one month, 2=payment delay for two months, … 8=payment delay for eight months, 9=payment delay for nine months and above)
PAY_2  Repayment status in August 2005 (scale same as above)
PAY_3  Repayment status in July 2005 (scale same as above)
PAY_4  Repayment status in June 2005 (scale same as above)
PAY_5  Repayment status in May 2005 (scale same as above)
PAY_6  Repayment status in April 2005 (scale same as above)
BILL_AMT1  Amount of bill statement in September 2005 (NT dollar)
BILL_AMT2  Amount of bill statement in August 2005 (NT dollar)
BILL_AMT3  Amount of bill statement in July 2005 (NT dollar)
BILL_AMT4  Amount of bill statement in June 2005 (NT dollar)
BILL_AMT5  Amount of bill statement in May 2005 (NT dollar)
BILL_AMT6  Amount of bill statement in April 2005 (NT dollar)
PAY_AMT1  Amount of previous payment in September 2005 (NT dollars)
PAY_AMT2  Amount f previous payment in August 2005 (NT dollars)
PAY_AMT3  Amount of previous payment in July 2005 (NT dollars)
PAY_AMT4  Amount of previous payment in June 2005 (NT dollars)
PAY_AMT5  Amount of previous payment in May 2005 (NT dollars)
PAY_AMT6  Amount of previous payment in April 2005 (NT dollars)
default payment next month  Default payment (1=yes, 0=no)

 

 

Don't have time to write this essay on your own?
Use our essay writing service and save your time. We guarantee high quality, on-time delivery and 100% confidentiality. All our papers are written from scratch according to your instructions and are plagiarism free.
Place an order

Cite This Work

To export a reference to this article please select a referencing style below:

APA
MLA
Harvard
Vancouver
Chicago
ASA
IEEE
AMA
Copy to clipboard
Copy to clipboard
Copy to clipboard
Copy to clipboard
Copy to clipboard
Copy to clipboard
Copy to clipboard
Copy to clipboard
Need a plagiarism free essay written by an educator?
Order it today

Popular Essay Topics