Machine Learning and Credit Risk (part 4) – Support vector Machines

Support Vector Machines (SVM) algorithms are some of the best “out-of-box” machine learning tools available. They are not only used for both linear and nonlinear classifications but can also be extended from binary classification to support multi-class classification.

The principle behind an SVM is to build an optimal decision boundary to separate (or classify) the data for different classes.  Once we create a reliable decision boundary between our various classes, we can predict with a high degree of certainty where the new points will fall.

Once a decision boundary is obtained, the algorithm obtains the margin for each data point. The margin for a single data point is the distance from that data point to the decision boundary. The larger the margin, the more confident we are about the predictive power of classification. A small margin may indicate less confidence in classification and hence the predictive power of the model.

As shown in Figure 4a, there exists many decision boundaries which can provide a fit.  Intuitively, we can see that a decision boundary close to any single point may not adequately generalize the separation of the classes.

Figure 6a. Support Vector Machines – Examples of Less Optimal Decision Boundaries

SVM new2

On the other hand, there is a decision boundary that provides the best fit. This is the optimal decision boundary which maximizes the margin of the training data.  In Figure 4b, below, the solid blue line is the optimal decision boundary and the points (or vectors) C and K are the support vectors because they support the optimal decision boundaries.  Without the other points we can still find the optimal decision boundary.

Figure 6b. Support Vector Machines – Example of an Optimal Decision Boundary

SVM new1

SVM is typically used as classification tools. There are several implementations of techniques to convert the outputs of SVM classifiers into calibrated probabilities. Since our migration matrices contain probabilities the modified version of SVMs are more suited to our needs.

Platt Scaling (a logistic transformation of SVM’s scores, fit by an additional cross-validation on the training data) is the most commonly implemented method to convert binary SVM classification outputs to probabilities. This method though has received some criticism that it may to be a true probability measure as it is a post-hoc method. In the multi-class case, this is extended as per Wu et al. (2004).

The various implementations of SVM (for example in R and Python) allow flexibility through the use of Kernel functions. The implementation of SVM in the R package e1071 provides most common kernels, including linear, polynomial, radial basis, and sigmoid.

Disadvantages:

  • It is not always evident how to parameterize the SVM model. The ‘caret’ package in R and other packages in Python help with the parameterization through a grid search.
  • It can be tricky to find the appropriate kernel sometimes.
  • SVM’s are not very efficient with large number of observations as a result it takes longer to run SVM than either the MNL or LDA.

Figure 7 displays the conditional matrices derived by the Support Vector Machine approach. It displays the forecasted migration matrices conditioned on Baseline, Adverse, and Severely Adverse economic conditions.

SVM

library("RTransProb")

for (i in c(24, 25, 26)) {
  print(paste("RUN-",i,sep=""))
  data <- data

  histData <- histData.normz

  predData_svm2 <- predData_svm_SeverelyAdverse
  predData_svm2 <- subset(
    predData_svm2,
    X == i,
    select = c(Market.Volatility.Index..Level..normz
    )
  )

  indVars   = c("Market.Volatility.Index..Level..normz"

  )

  startDate = "1991-08-16"
  endDate   = "2007-08-16"

  depVar <- c("end_rating")

  pct <- 1
  wgt <-  "mCount"
  ratingCat <- c("A", "B", "C", "D", "E", "F", "G")
  defind    <- "G"
  lstCategoricalVars <- c("end_rating")
  tuning <- "FALSE"
  cost <- 0.01
  gamma <- 0.01
  cost.weights <-  c(0.01, 0.05, 0.1, 0.25, 10, 50, 100)
  gamma.weights <- c(0.01, 0.05, 0.1, 0.25, 10, 50, 100)
  kernelType <- "radial"
  method    = "cohort"
  snapshots = 1
  interval  = 1

  svm_TM <-
    transForecast_svm(
      data, histData, predData_svm2, startDate, endDate, method, interval,
      snapshots, defind, depVar, indVars, ratingCat, pct, tuning,
      kernelType, cost, cost.weights, gamma, gamma.weights
    )
  print(svm_TM)

}

 

Continue Reading

Previous: Machine Learning and Credit Risk (part 3) – Multinomial Logistic Regression

Next: Machine Learning and Credit Risk (part 5) – Neural Networks

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.