This post by Data Scientist Tony Yiu is a summary of a longer blog he published on his Medium account, which you can read in full here.
Models provide necessary simplifications to a complex world. They reduce real-world phenomena into a set of key features and relationships that allow us to explain, analyze, and sometimes even predict. But there is a cost to these powerful benefits. Every model comes with several key assumptions, and if these assumptions are not met, the output of the model can become unreliable or even downright dangerous.
In the finance industry, an investment’s return is famously assumed to be normal and the volatility (aka standard deviation) of those returns are assumed to be a good approximation of the investment’s risk. These assumptions flow into nearly every metric that portfolio managers use to structure their portfolio or that banks use to measure and hedge their risk. Even after the recession in 2008, when supposed six-sigma-type losses (based on the risk measures at the time) to real estate-related investments and loans wreaked havoc across the industry and almost brought down the global economy, we continue to equate volatility to risk and assume that asset returns are normally distributed.
The truth is that when markets are calm and returns are smooth, which is most of the time, asset returns (whether you measure them daily, weekly, monthly, etc.) are normally distributed. So it’s easy to be fooled into thinking, “it works most of the time, so it’s good enough for me.”
What this type of thinking is missing is the following question: “Does the model work when I really need it to work?”
A risk model needs to work during times of great stress because it’s supposed to answer the following questions:
- - In a worst-case scenario, how much will I lose?
- - Where am I likely to get hit the hardest?
It’s in attempting to answer these questions that the assumption of normality (and the reliance on historical data) fails us. We end up understating both the frequency and magnitude of the worst-case scenario.
This is just one example of why it’s important to understand the assumptions of your model as well as the implications of not conforming to those assumptions. It doesn’t necessarily render your model unusable, but it means you need to build contingencies or even alternative models to hedge against these shortcomings.
Before transitioning into data science, Tony Yiu spent nine years in the investments industry as a quantitative researcher, where he worked on portfolio optimization, economic simulation, and built numerous forecasting models to predict everything from emerging market equity returns to household spending in retirement. He now works as a data scientist at Solovis, where he uses his experience in statistics, finance, and machine learning to design and build risk analytics software for financial institutions. Tony is also a Metis Bootcamp graduate and we’re excited to have him back with us as a contributor to the blog, where he’ll write about data science and analytics in business and industry.