A support vector machine (SVM) is a supervised learning technique that analyzes data and isolates patterns applicable to both classification and regression. The classifier is useful for choosing between two or more possible outcomes that depend on continuous or categorical predictor variables. Based on training and sample classification data, the SVM algorithm assigns the target data into any one of the given categories. The data is represented as points in space and categories are mapped in both linear and non-linear ways.
For SVM, we use Radial Basis as a kernel function. Due to limited computation reason, we use 5% of downsampling data for tuning parameter and 10% of downsampling data for training.
1#choose small data for tuning 2train_index_tuning = createDataPartition(samp$loan_status,p =0.05,list=FALSE,times=1)3#choose small data for re-train4train_index_training = createDataPartition(samp$loan_status,p =0.1,list=FALSE,times=1)5
1> svmTuned
2Support Vector Machines with Radial Basis Function Kernel
31268 samples
470 predictor
52 classes:'Default','Fully.Paid'6No pre-processing
7Resampling: Cross-Validated (3 fold)8Summary of sample sizes:845,845,8469Resampling results across tuning parameters:10 sigma C ROC Sens Spec
110.0037966620.10.68170070.59120240.6639021120.0037966621.00.67587360.62618860.6388193130.00379666210.00.65503180.61515990.5899431140.0080356560.10.67130620.58512920.6751244150.0080356561.00.67082040.62779070.6072610160.00803565610.00.62950200.59468990.5789442170.0130885870.10.66307730.60252170.6261960180.0130885871.00.66721800.63565230.5978122190.01308858710.00.61565580.60411640.561566720ROC was used to select the optimal model using the largest value.21The final values used for the model were sigma =0.003796662and C =0.1.22>23
The best parameter for the model is sigma = 0.003796662, and C = 0.1. We use this values to fit the 10% of downsampling data and collect its performance based on test set.
1svm_model = ksvm(loan_status ~.,2 data = samp[train_index_training,],3 kernel ="rbfdot",4 kpar =list(sigma=0.003796662),5 C =0.1,6 prob.model = TRUE,7 scaled = FALSE)8