Standard Deviation

(Also available in Pyret)

Students learn how standard deviation serves as Data Scientists' most common measure of "spread": how far all the values in a dataset tend to be from their mean. When we looked at box plots, we visualized spread based on range and interquartile range. Now we’ll return to histograms and picture the spread in terms of standard deviation.

Lesson Goals

Students will be able to…

apply one approach to measuring and displaying spread of a dataset
compare and contrast information displayed in a box plot and a histogram

Student-facing Lesson Goals

Let’s compare different uses for box plots and histograms when talking about data.

Materials

Supplemental Materials

🔗Measuring "Deviance" 30 minutes

Overview

Students review the notion of spread itself, and build up to the formula by annotating histograms.

Launch

The Animal Shelter Bureau reports that the mean age of shelter cats is 3 years.

Take a look at the Animals Starter File. (You can also look on this page - or if you are using a printed workbook, you’ll find it at the front).

Does a mean age of 3 years translate to all of the cats being close to 3 years old? Why or why not?
- No, we cannot assume all cats are close to 3 years old. There are some outliers in the dataset.

In the activity that follows, students will look at ten cats from the shelter to consider the distribution of their ages.

Turn to Computing Standard Deviation, and complete numbers 1-3.

What did you get for the mean? Does it match what the Animal Shelter Bureau says?
- The mean is 3; yes, it matches what the Animal Shelter Bureau says.
Can you think of four ages, such that their mean is 3 years old?
- Any four ages that add up to 12 will work!
- Possibilities include: {1,1,1,9}, {1,1,2,8}, {1,1,3,7}, {1,1,4,6}, {1,1,5,5}, {1,2,2,7}, {1,2,3,6}, {1,2,4,5}, {1,3,3,5}, {1,3,4,4}, {2,2,2,6}, {2,2,3,5}, {2,2,4,4}, {2,3,3,4}, {3,3,3,3}
Can you think of a different spread of four ages that would have the same mean?
- See above.
How many different sets of four ages can you think of, which all have a mean of 3?
- 15. See above.

Without a measure of spread, just knowing the mean doesn’t tell us enough about the shape of the data.

When summarizing a column, we’d like to use a measure that gathers data from every value. We already have one method of measuring spread: calculating the Five Number Summary and using it to generate a box-plot.

Unfortunately, that measure comes from only a small number of data points! If possible, we’d like to have a measure that summarizes the spread across all the points.

Instead of focusing on the handful of data points used in our Five Number Summary, another way to measure spread is to focus on the "typical" distance from the mean. In other words, we want to know what kind of deviation is "standard" for all the points.

Standard deviation is the most useful way to summarize spread of a quantitative column.

Investigate

A histogram showing the distribution of ages for 10 cats, between the ages of 1 and 8. A star labeling the mean is drawn on the x-axis at 3.

A histogram showing the distribution of ages for 10 cats, between the ages of 1 and 8. A star labeling the mean is drawn on the x-axis at 3. We could imagine a shelter where every cat is between 2 and 4, so each cat only deviates from the mean by 1 year! But we could also imagine a shelter with only kittens and very old cats, where cats deviate by as much as 10 years from the mean!

How far away is each data point from 3?

A histogram showing the distribution of ages for 10 cats, between the ages of 1 and 8. A star labeling the mean is drawn on the x-axis at 3, and arrows show the distance between the mean and each point in the first interval. In this image, we’ve draw an arrow for each of the 1-year-old cats. That means there are four arrows running from the mean at 3 to the interval at 1, and each arrow has the label 2.

Complete numbers 4 to 6 of Computing Standard Deviation.

Mean Average Deviation?

In this section of the worksheet, students will need to stretch their visual imaginations a bit! In problem number 6, they are asked to summarize all 10 distances from the mean into a single number. The goal here is for students to make an educated guess about standard deviation (SD) before learning the algorithm for computing it. Invite and encourage discussion about students' different approaches for guessing at the best summary number before sharing the key idea about standard deviation!

Students are likely to hone in on the Mean Average Deviation, or MAD. Both SD and MAD measure variability or "spread" by computing individual deviations from the mean, but MAD averages these deviations and SD transforms them via square/square-root.

To compute the standard deviation we add the squares of all N distances, divide by N-1, then take the square root of the result.

The process of finding standard deviation manually is a bit laborious. Keeping organized is crucial; a partially-completed table is provided on the bottom half of worksheet to support students in doing so.

Complete numbers 7-10 of Computing Standard Deviation, where you will utilize the algorithm for computing standard deviation.

To compute standard deviation in CODAP, create a graph with only one quantitative attribute. Open the Measure menu, then select the button that says "Measures of Spread." (Note that this button only appears when one quantitative attribute displayed.) Selected Standard Deviation. Move your cursor back to the display, and hover over the edge of the purple shading that appears.

A dot plot showing animals' weights which also displays the standard deviation of 48.5.

What is the standard deviation for the weights of all the animals at our dataset?
- Approximately 48.5

For additional practice, have students complete Computing Standard Deviation (2).

Synthesize

Can you explain why two datasets can have the same mean, but different standard deviations?
- Mean is a measure of central tendency, whereas standard deviation measures the variation of some sample.
What kind of dataset would have a standard deviation of zero?
- A standard deviation of zero means that every number in the sample is exactly the same.