Which Bootcamp is Right for Your Career Goals? Explore Programs

Misleading Graphs: Manipulating the Y-Axis

By Roberto Reif • April 06, 2020

One of the most commonly used charts for data visualization is the bar chart, which encodes numbers into the height of a bar and is typically used to compare the relative size of two or more bars. In the example below, we can observe that the red bar is twice as tall as the blue bar, appropriately displaying the relationship between the numbers 40 and 20.  

This type of visualization makes sense when we start the y-axis at 0 because the data and height of the bar are in agreement. However, by modifying the starting value of the y-axis, we can skew the interpretation of the chart. For example, in the chart below, the red bar appears as being more than twice as big as the blue bar, which is misleading compared to the data. We achieve this by starting the y-axis at the value of 15. Although the data values are shown, we tend to focus first on the visuals before we process the numbers and therefore, make incorrect conclusions.

Misleading examples like the one above are frequently found in the real world. Below, you’ll see a chart that compares actual Obamacare enrollments versus the established goal. In the graph, I included three red arrows to indicate how the second bar is almost three times as big as the first bar. 

However, when we read the actual numbers (shown in the chart below), we learn that the goal of 7,066,000 enrollments is only 17% larger than the 6,000,000 enrollments as of March 27th. Generally, we don’t think to do this mental calculation, given that the effort of calculating the percentage of a number is much more significant than just comparing the relative heights of the bars, which, in this case, leads us to believe that the gap between the two is much larger than 17%.

The chart was eventually corrected as shown below, which contains both bars starting at 0.

Let’s look at another example below. In 2013, the presidential election in Venezuela had two main candidates, Nicolás Maduro (on the left) and Henrique Capriles (on the right). Maduro won the election, which is based on the popular vote. What percentage of the popular vote would you say each candidate obtained?  

Based on the image above, we would guess that Nicolás Maduro won by a landslide. However, the percentage of votes was displayed on the graph (as shown below), and we can see that that this was a very close race with a difference of only 1.59%.  

Although the actual numbers of the results were presented, the heights of the bars do not match the data, and our brains are wired to focus on images over text. The y-axis, although not shown, does not start at 0, which provides the false impression that the difference between the percentage of votes obtained by each candidate was much larger. A more appropriate image for the election data with a y-axis starting at zero is presented below.

In this case, the height of the bars display the results of a tighter race and match the values of the data. In summary, we should be careful to note if the y-axis on bar charts start at 0; otherwise, we can be fooled into making wrong conclusions about our data.

Similar Posts

data science
Data Scientist Roundup: How to Make a Seaborn Lineplot, Python and Data Literacy Videos, & More

By Emily Wilson • September 01, 2020

When our Data Scientists aren't teaching the intensive 12-week bootcamps or corporate training courses, they're working on a variety of other projects. This monthly blog series tracks and discusses some of their recent activities and accomplishments.

data science
Our Top 10 Most-Read Blog Posts of 2020

By Carlos Russo • December 22, 2020

Year after year, we enjoy sharing posts that feature our alumni stories, data science and analytics thought leadership from our Data Scientists, guest posts, and so much more. Here we’ve gathered the top 10 most-read posts of 2020 for you to enjoy.

data science
Reduced Tuition on Remaining 2020 Bootcamp Prep Courses

By Carlos Russo • September 08, 2020

From today through October 9th, when you enroll in any of our remaining 2020 Bootcamp Prep Courses, you'll save 33% on tuition. We hope this reduced tuition provides opportunities for those interested in building data science skills in a live online format, taught by industry leaders.