Webinar June 18: Deep Learning Approaches to Forecasting and Planning Register

Understanding the Limitations of Your Model and Its Assumptions

By Tony Yiu • April 14, 2020

This post by Data Scientist Tony Yiu is a summary of a longer blog he published on his Medium account, which you can read in full here.

Models provide necessary simplifications to a complex world. They reduce real-world phenomena into a set of key features and relationships that allow us to explain, analyze, and sometimes even predict.  But there is a cost to these powerful benefits. Every model comes with several key assumptions, and if these assumptions are not met, the output of the model can become unreliable or even downright dangerous.

In the finance industry, an investment’s return is famously assumed to be normal and the volatility (aka standard deviation) of those returns are assumed to be a good approximation of the investment’s risk.  These assumptions flow into nearly every metric that portfolio managers use to structure their portfolio or that banks use to measure and hedge their risk. Even after the recession in 2008, when supposed six-sigma-type losses (based on the risk measures at the time) to real estate-related investments and loans wreaked havoc across the industry and almost brought down the global economy, we continue to equate volatility to risk and assume that asset returns are normally distributed.

The truth is that when markets are calm and returns are smooth, which is most of the time, asset returns (whether you measure them daily, weekly, monthly, etc.) are normally distributed.  So it’s easy to be fooled into thinking, “it works most of the time, so it’s good enough for me.”

What this type of thinking is missing is the following question: “Does the model work when I really need it to work?”  

A risk model needs to work during times of great stress because it’s supposed to answer the following questions:

  • - In a worst-case scenario, how much will I lose?
  • - Where am I likely to get hit the hardest?

It’s in attempting to answer these questions that the assumption of normality (and the reliance on historical data) fails us.  We end up understating both the frequency and magnitude of the worst-case scenario.

This is just one example of why it’s important to understand the assumptions of your model as well as the implications of not conforming to those assumptions.  It doesn’t necessarily render your model unusable, but it means you need to build contingencies or even alternative models to hedge against these shortcomings. 


Before transitioning into data science, Tony Yiu spent nine years in the investments industry as a quantitative researcher, where he worked on portfolio optimization, economic simulation, and built numerous forecasting models to predict everything from emerging market equity returns to household spending in retirement. He now works as a data scientist at Solovis, where he uses his experience in statistics, finance, and machine learning to design and build risk analytics software for financial institutions. Tony is also a Metis Bootcamp graduate and we’re excited to have him back with us as a contributor to the blog, where he’ll write about data science and analytics in business and industry

Similar Posts

business resource
Throughout April, Join Us for Free Intro to Python Training

By Metis • April 03, 2020

Join us every Tuesday in April for Intro to Python, a free live online training series created for business professionals. Taught live via Zoom, participants can ask questions in real-time and chat with other attendees.

business resource
Demystifying Data Science Talk Recap: Atif Kureishy on Applying AI to Understand Customer Behaviors

By Emily Wilson • April 23, 2020

During his Demystifying Data Science talk, Atif Kureishy (Global Vice President of AI & Deep Learning Products at Teradata) discussed how to use AI to merge offline and online activity in order to better serve customers while staying cost-efficient in the retail space.

business resource
Understanding Natural Language Processing

By Tony Yiu • May 22, 2020

Natural language processing (NLP) is one of the trendier areas of data science. Its end applications are many – chatbots, recommender systems, search, virtual assistants, etc. In this post, Data Scientist Tony Yiu helps readers understand the basics of NLP, including why it's important and how it helps us.