Bayesian Improved Surname Geocoding (BISG) Race Predictor

The Bayesian Improved Surname Geocoding (BISG), is used by the CFPB to determine race and ethnicity proxies. In recent years the algorithm has been used to determine alleged discrimination at auto finance companies, including an $80 million dollar fine for a well-known bank. This method is far from perfect. For example, someone’s ascribed probabilities can change due to marriage or change of residence.

I created the prototype of a calculator using R/Shinydashboard available at for readers to play with. It takes as inputs: ‘name’ and ‘zip code’ and ascribes the probabilities of that person being of various races and ethnicities using the Bayes rule.


Understanding Correlations and Copulas in Finance: An Application in Risk Portfolio Aggregation using R

I am not a fan of articles where the authors use widgets and other unrelatable examples to illustrate complex concepts. Here I will illustrate the use of copulas in finance using the example of risk aggregation to drive through the points. First, though, it is important to briefly explain the risk aggregation problem. There are plenty of texts available on the internet which go in great detail about the inner workings of copulas. My goal is to help the reader develop an intuitive understanding of how copulas are used in finance.Read More »

One Data Scientist’s Primitive Approach to Analyzing the SunTrust – BB&T Merger

Recently when I heard that SunTrust and BB&T merged, I casually wondered about the footprint of the combined bank. Although the number of branches is not necessarily an indicator of the health or performance of a bank, there are benefits to customers of having brick and mortar branches of commercial banks. A few of these include:

  • Readily available ATMs to help in cash withdrawals
  • Customers’ ability to transact large cash withdrawals
  • More robust banking relationships as customers tend to more readily trust banks with branches they can walk into
  • Human contact to answer money-related questions

Quite often, a data scientist’s job is to summarize and communicate information in a clear and concise manner. We’ve all heard the phrase, “a map is worth a thousand words”, and with good reason. Maps are easy to interpret, they are nice to look at and they give us context without having to use too many words. In that same spirit, below we present some maps highlighting the geographic coverage of SunTrust, BB&T and that of the combined entities. A visual approach seems to be a rather pleasing way to allow an audience to rapidly understand the raw location data.

Read More »