Students are introduced to box plots, learn to evaluate the spread of a quantitative column, and deepen their perspective on shape by matching box plots to histogram.
Lesson Goals 
Students will be able to…

Studentfacing Lesson Goals 

Materials 

Preparation 

 box plot

the box plot (a.k.a. boxand whiskerplot) is a way of displaying a distribution of data based on the fivenumber summary: minimum, first quartile, median, third quartile, and maximum
 interquartile range

(IQR) is one possible measure of spread, based on dividing a dataset into four parts. The values that divide each part are called the first quartile (Q1), the median, and third quartile (Q3). IQR is calculated as Q3 minus Q1.
 maximum

the largest value in a dataset
 median

the middle element of a quantitative dataset
 minimum

the smallest value in a dataset
 quartile

each of four equal groups into which a population can be divided according to the distribution of values of a particular variable.
 range

the type or set of outputs that a function produces
 range of a dataset

the distance between minimum and maximum values
 sample

a set of individuals or objects collected or selected from a statistical population by a defined procedure
 shape

The aspect of a dataset  visible in a histogram or box plot  that describes which values are more or less common.
 spread

the extent to which values in a dataset vary, either from one another or from the center
🔗Making Box Plots 30 minutes
Overview
Students are introduced to the notion of spread in a dataset. They learn about quartiles, box plots, and how to use them to talk about spread.
Launch
When we explored measures of center, we tried to answer a question about "typical" values. We considered a fact  that the Animal Shelter Bureau says the average pet weighs almost 41 pounds.
How useful is this fact, really? Maybe all the pets weigh between 35 and 45 pounds, with every pet close to the mean. But maybe all the pets are super small or huge, and no one is even near to the mean!
So once we have our summary for a "normal value", it’s likely we’ll ask another question: If the average pet is 41 pounds, just how typical is that?
There are differences in every class of students. Not everyone likes the same music, not everyone dresses the same, etc. So we’d expect some deviation  or spread  in any class of students! Some classes are more different than others. How do we measure the spread of a population?
Suppose we lined up all animals' weights from smallest to largest, and then split them in half by taking the median. We can learn something about the spread of the dataset by taking the median of each half, splitting the population into four equalsized quartiles.

The first quartile (Q1) is the value for which 25% of the animals weighed that amount or less.

What animals does the third quartile represent?

The third quartile is the value for which 75% of the animals weighed that amount or less.

Besides looking at the median as center, and the spread between Q1 and Q3, we also gain valuable information from the spread of the entire dataset—that is, the distance between minimum and maximum. This is called the range of a dataset. (Note: the term “Range” means something different in statistics than it does in algebra and programming!)
Splitting a dataset into quartiles gives us five numbers that we can play with to measure spread. To summarize what we’ve seen so far:

Minimum: the smallest value in a dataset

Q1: the median that falls between the minimum and Q2

Q2: Median: the middle value (median) in a dataset

Q3: the median that falls between and Q2 and the maximum

Maximum: the largest value in a dataset
Taken together these are called the 5 Number Summary of a dataset, and this summary is one tool for calculating spread. We can use these numbers to calculate two new values:

Maximum  Minimum = Range : the distance spanned by the extreme values in the dataset

Q3  Q1 = IQR: the Interquartile Range, or the distance spanned by the middle half of the data
Investigate
We can use box plots to visualize the 5 number summary, the Range, and the Interquartile Range. Below is the contract for boxplot
, along with an example that will make a box plot for the pounds
column in the animalstable
.
boxplot :: (t::Table, col::String) > Image
# Consumes a table and the name of the column
# to plot, and produces a box plot"
boxplot(animalstable, "pounds")
Box plots divide our sample into equallysized groups, and show where those groups are spread thin or clumped together.
Type boxplot(animalstable, "pounds")
into the Interactions Area, and see the resulting plot.
This plot shows us the center and spread in our dataset according to those five numbers.

Minimum (the left “whisker”)  the smallest value in the dataset . In our dataset, that’s just 0.1 pounds.

Q1 (the left edge of the box)  computed by taking the median of the lower half of the values. In the pounds column, that’s 3.9 pounds.

Q2 / Median value (the line in the middle), which is the middle Quartile of the whole dataset. We already computed this to be 11.3 pounds.

Q3 (the right edge of the box), which is computed by taking the median of the upper half of the values. That’s 60.4 pounds in our dataset.

Maximum (the right “whisker”)  the largest value in the dataset . In our dataset, that’s 172 pounds.

Fill in the fivenumber summary for the
pounds
column, and sketch the box plot. 
What conclusions can you draw about the distribution of values in this column?

While the animals' weights range from 0.1 pounds to 172 pounds, 50% of the animals weigh 11.3 pounds or less. The animal that weighs 172 pounds may be an outlier.

Common Misconceptions
It is extremely common for students to forget that every quartile always includes 25% of the dataset. This will need to be heavily reinforced.
Synthesize

What percentage of points make up the Q1?

25%


What percentage of points make up Q2?

25%


What percentage of points make up Q3?

25%


What percentage of points make up Q4?

25%


What percentage of points make up the Interquartile Range (IQR)?

50%


What percentage of points make up the Range?

100%

Optional: Have students work in pairs to complete this Box Plot Vocab Concept Map.
🔗Interpreting Box Plots 30 minutes
Overview
Students learn how to read a box plot, and consider spread and variability. They connect this visualization of spread to what they learned about histograms.
Launch
Just as pie and bar charts are ways of visualizing categorical data, box plots and histograms are both ways of visualizing the shape of quantitative data.
Box plots make it easy to see the 5number summary, and compare the Range and Interquartile Range. Histograms make it easier to see skewness and more details of the shape, and offer more granularity when using smaller bins.
Leftskewness is seen as a long tail in a histogram. In a box plot, it’s seen as a longer left "whisker" or more spread in the left part of the box. Likewise, right skewness is shown as a longer right "whisker" or more spread in the right part of the box.
Box plots and histograms give us two different views on the concept of shape.
Intervals  PointsperInterval  

Box Plots 
Variable 
Fixed 
Histograms 
Fixed 
Variable 
Histograms: fixed intervals (“bins”) with variable numbers of data points in each one. Points “pile up in bins”, so we can see how many are in each. Larger bars show where the clusters are.
Box plots: variable intervals (“quartiles”) with a fixed number of data points in each one. Treats data more like “pizza dough”, dividing it into four equal quarters showing where the data is tightly clumped or spread thin. Smaller intervals show where the clusters are.
Kinesthetic Activity Divide the class into groups, and give each group a ruler and a ball of playdough. Have them draw a number line from 06 with the ruler, marking off the points at 0, 3, 4, 4.5 and 6 inches. Have the groups roll the dough into a thick cylinder, divide that cylinder in half, and then split each half to form four equallysized cylinders. The playdough represents a sample, with values divided into four quartiles. Box plots stretch and squeeze these equal quartiles across a number line, so that each quartile fills up an interval in that quartile. On their number line, students have intervals from 03, 34, 44.5, and 4.56. Have students roll their cylinders so that they fill each of these intervals, retaining a uniform thickness. They should notice that shorter intervals have thicker cylinders, and longer ones have skinny ones. Even though a box plot doesn’t show us the thickness of the datapoints, we can tell that a small intervals has the same amount of data "squeezed" into it as a large interval. 
Investigate

Complete Identifying Shape  Box Plots and see if you can describe box plots using what you know about skewness.

To make connections between histograms and box plots, complete Matching Box Plots to Histograms

Optional: Complete Matching Box Plots to Histograms and/or Matching Box Plots to Histograms (Desmos)
Modified Box Plots More Statistics or Mathoriented classes will also be familiar with modified box plots (video explanation), which remove outliers from the boxandwhisker and draw them as asterisks outside of the plot. Modified box plots are also available in Bootstrap:Data Science, using the following contract:

Synthesize
Histograms, box plots, and measures of center and spread are all different ways to get at the shape of our data. It’s important to get comfortable using every tool in the toolbox when discussing shape!
We started talking about measures of center with a single question: is "average" the right measure to use when talking about animals' weights? Now that we’ve explored the spread of the dataset, do you agree or disagree that average is the right summary?
Project Option: Stress or Chill? Students can gather data about their own lives, and use what they’ve learned in the class so far to analyze it. This project can be used as a midterm or formative assessment, or as a capstone for a limited implementation of Bootstrap:Data Science. The project description is Stress or Chill? [rubric] (You will also need the Personality Colors assessment) 
🔗Your Own Analysis flexible
Overview
Students apply what they’ve learned to their own dataset.
Launch
What are the quantitative columns in your dataset? How are they distributed?
Are all the values pretty close together, or really spread out?
Are they clumped on the right, with a few outliers skewing to the left? Or are they clumped on the left, with a few outliers skewing to the right?
Investigate

How are the quantitative columns in your dataset distributed? Data Cycle: Shape of My Dataset, and use the Data Cycle to explore two quantitative columns with box plots.

Then add these displays  and your interpretations!  to the "Making Displays" section.

Do these displays bring up any interesting questions? If so, add them to the end of the document.

Complete Shape of My Dataset, and explain the connection between measures of center and your box plots.

Complete the "Measures of Center and Spread" section of the Dataset Exploration.
Synthesize
Have students share their findings.

Were any of them surprising?

What, if any, outliers did they discover when making box plots?

What measures of center makes the most sense for one column or another?
🔗Additional Exercises
These materials were developed partly through support of the National Science Foundation, (awards 1042210, 1535276, 1648684, and 1738598). Bootstrap by the Bootstrap Community is licensed under a Creative Commons 4.0 Unported License. This license does not grant permission to run training or professional development. Offering training or professional development with materials substantially derived from Bootstrap must be approved in writing by a Bootstrap Director. Permissions beyond the scope of this license, such as to run training, may be available by contacting contact@BootstrapWorld.org.