# Bayesian Improved Surname Geocoding (BISG) Race Predictor

The Bayesian Improved Surname Geocoding (BISG), is used by the CFPB to determine race and ethnicity proxies. In recent years the algorithm has been used to determine alleged discrimination at auto finance companies, including an \$80 million dollar fine for a well-known bank. This method is far from perfect. For example, someone’s ascribed probabilities can change due to marriage or change of residence.

I created the prototype of a calculator using R/Shinydashboard available at https://pabdndiaye.shinyapps.io/bisg_shiny/ for readers to play with. It takes as inputs: ‘name’ and ‘zip code’ and ascribes the probabilities of that person being of various races and ethnicities using the Bayes rule.

# Understanding Correlations and Copulas in Finance: An Application in Risk Portfolio Aggregation using R

I am not a fan of articles where the authors use widgets and other unrelatable examples to illustrate complex concepts. Here I will illustrate the use of copulas in finance using the example of risk aggregation to drive through the points. First, though, it is important to briefly explain the risk aggregation problem. There are plenty of texts available on the internet which go in great detail about the inner workings of copulas. My goal is to help the reader develop an intuitive understanding of how copulas are used in finance.Read More »

# Machine Learning and Credit Risk (part 6) – Multi-class Linear Discriminate Analysis

Linear Discriminate Analysis (LDA) is another method which should be familiar to statisticians and economists.  LDA is a dimensionality reduction technique which has found its use in machine learning because of how well it functions as a classifier. Its primary goal is to project data onto a lower dimensional space.Read More »

# Machine Learning and Credit Risk (part 5) – Neural Networks

Neural networks models are a flexible class of machine learning algorithms which can be used for both supervised as well as unsupervised learning and can approximate discrete or continuous functions.  They are loosely modeled on the functions of human brains and attempt to allow computers to learn in manner similar to humans.Read More »

# Machine Learning and Credit Risk (part 4) – Support vector Machines

Support Vector Machines (SVM) algorithms are some of the best “out-of-box” machine learning tools available. They are not only used for both linear and nonlinear classifications but can also be extended from binary classification to support multi-class classification.Read More »

# Machine Learning and Credit Risk (part 3) – Multinomial Logistic Regression

Logistic regression has been a reliable tool in many Statisticians/Economists toolkit for many years when dealing with binary problems where the output is 0/1, True/False, or any variation of a dichotomous problem. But the reality is that Multinomial Logistic regression is a very important ‘algorithm’ in the machine learning sphere.Read More »

# Machine Learning and Credit Risk (part 2) – Credit Cycle Method

The canonical method to forecasting a credit migration matrix is an econometric model: the one factor approach described in Belkin et al. (1998). This approach suggests that one might consider an approach to condition migration (transition) matrices by creating a systematic component which represents the “credit cycle” that relates the economic condition to the credit quality of a loan portfolio.  The credit cycle can be thought of as the historical pattern of credit rating shared by all borrowers in a sector or economy.Read More »