- Credit Risk Modelling - Case Studies
- Classification vs. Regression Models
- Case Study - German Credit - Steps to Build a Predictive Model
- Import Credit Data Set in R
- German Credit Data : Data Preprocessing and Feature Selection in R
- Credit Modelling: Training and Test Data Sets
- Build the Predictive Model
- Logistic Regression Model in R
- Measure Model Performance in R Using ROCR Package
- Create a Confusion Matrix in R
- Credit Risk Modelling - Case Study- Lending Club Data
- Explore Loan Data in R - Loan Grade and Interest Rate
- Credit Risk Modelling - Required R Packages
- Loan Data - Training and Test Data Sets
- Data Cleaning in R - Part 1
- Data Cleaning in R - Part 2
- Data Cleaning in R - Part 3
- Data Cleaning in R - Part 5
- Remove Dimensions By Fitting Logistic Regression
- Create a Function and Prepare Test Data in R
- Building Credit Risk Model
- Credit Risk - Logistic Regression Model in R
- Support Vector Machine (SVM) Model in R
- Random Forest Model in R
- Extreme Gradient Boosting in R
- Predictive Modelling: Averaging Results from Multiple Models
- Predictive Modelling: Comparing Model Results
- How Insurance Companies Calculate Risk
Import Credit Data Set in R
We are using the German Credit Scoring Data Set in numeric format which contains information about 21 attributes of 1000 loans.
First, setup a working directory and place this data file in that directory. Then, import the data into your R session using the following command:
Attribute Details
20 attributes are used in judging a loan applicant. The goal is the classify the applicant into one of two categories, good or bad, which is the first attribute, Creditability. This represents whether the credit applicant is creditable or not (Credit Worthiness).
Some more notes about the data:
- Creditability
- 0: Bad credit
- 1: Good credit
- Account Balance
- 1: < 0 DM
- 2: < 200 DM
- 3: >= 200 DM
- 4: No existing Account
- Duration of Credit Month - Loan Duration in Months
- Payment Status of Previous Credit - Credit History
- 0: No credits taken so far
- 1: All credit in this Bank paid back duly
- 2: Existing credits paid back dully till now
- 3: Delay in paying off in the past,
- 4: Credits existing in other banks
- Purpose - Loan Purpose
- 0:new car purchase
- 1: used car purchase
- 2: furniture or equipment purchase
- 3: radio or television purchase
- 4: domestic appliances purchase
- 5: repairs
- 6: education
- 7: vacation
- 8: retraining
- 9: Business
- 10: others
- Credit Amount (In DM - Deutsche Mark)
- Value of Savings/Stocks
- 1: < 100 DM
- 2: >= 100 and < 500 DM
- 3: >= 500 DM and 1000 DM
- 4: >= 1000 DM
- 5: no savings / bonds
- Length of Current Employment
- 1: unemployed
- 2: < 1 year
- 3: >= 1 and < 4 years
- 4: >= 4 and < 7 years
- 5: >= 7 years
- Instalment Percent: Instalment rate in percentage of disposable income
- Sex & Marital Status
- 1: Divorced Male
- 2: Divorced/Married Female
- 3: Male Single
- 4: Married Male
- 5: Female Single
- Guarantors
- 1: None
- 2: Co-applicant
- 3: Guarantor
- Duration in Current Address (In Years)
- Most Valuable Available Asset
- 1: Real Estate
- 2: Life Insurance
- 3: Car or others
- 4: No property
- Age in Years
- Concurrent Credits
- Type of Apartment
- 1: Rented
- 2: Owned
- 3: For Free
- No. of existing credits at this bank
- Occupation (Job Status)
- 1: Unemployed non-resident
- 2: Unemployed resident
- 3: Skilled Employee
- 4: Self-Employed
- No. of dependents
- Telephone: German phone rates are very high, so fewer people own telephones
- 1: Available
- 2: Not Available
- Foreign worker: There are millions of foreign worker working in Germany
- 1: No
- 2: Yes
This dataset is typical of data used in data mining: we have 1000 records.
Lesson Resources
Related Downloads
Data Science in Finance: 9-Book Bundle
Master R and Python for financial data science with our comprehensive bundle of 9 ebooks.
What's Included:
- Getting Started with R
- R Programming for Data Science
- Data Visualization with R
- Financial Time Series Analysis with R
- Quantitative Trading Strategies with R
- Derivatives with R
- Credit Risk Modelling With R
- Python for Data Science
- Machine Learning in Finance using Python
Each book includes PDFs, explanations, instructions, data files, and R code for all examples.
Get the Bundle for $39 (Regular $57)Free Guides - Getting Started with R and Python
Enter your name and email address below and we will email you the guides for R programming and Python.