Use R to explore a real-life data set, then preprocess the data set such that it’s in the appropriate format before applying the credit risk models. First, I examed the dataset `loan_data` discussed in the video throughout the exercises in DataCamp.

• Goal: understand the number, percentage of defaults.
• Examine the relationship between `loan_status` and certain `factor` variables.

default information is stored in the response variable `loan_status`, where 1 represents a `default,`and 0 represents `non-default`.

For example, you would expect that the proportion of defaults in the group of customers with `grade` G (worst credit rating score) is substantially higher than the proportion of defaults in the `grade` A group (best credit rating score).

• EL= PD* EAD * LGD

Components of expected loss ( EL) ,  Probability of default (PD), Exposure at default (EAD),  Loss given default (LGD)

# Call CrossTable() on grade and loan_status
> CrossTable(loan_data\$grade, loan_data\$loan_status, prop.r = TRUE,
prop.c = FALSE, prop.t = FALSE, prop.chisq = FALSE)

• Use hist() to create a histogram with only one argument: `loan_data\$loan_amnt`. Assign the result to a new object called `hist_1`.
• Use `\$breaks` along with the object `hist_1` to get more information on the histogram breaks. Knowing the location of the breaks is important because if they are poorly chosen, the histogram may be misleading.
• Change the number of breaks in `hist_1` to 200 by specifying the `breaks` argument. Additionally, name the x-axis `"Loan amount"` using the `xlab` argument and title it `"Histogram of the loan amount"` using the `main`argument.