instagram

(Also available in Pyret)

Students are introduced to box plots, learn to evaluate the spread of a quantitative column, and deepen their perspective on shape by matching box plots to histogram.

Lesson Goals

Students will be able to…​

  • apply one approach to measuring and displaying spread of a dataset

  • compare and contrast information displayed in a box plot and a histogram

Student-facing Lesson Goals

  • Let’s compare different uses for box plots and histograms when talking about data.

Materials

Supplemental Materials

Preparation

  • There is an optional kinesthetic activity in this lesson that requires a ball of play-dough for each group of 3.

  • There is an optional interactive Desmos activity in the lesson. If you would like to use it, decide how you will share it with students and, if you are using our Google Slides, add the appropriate link to the slide deck. If you’re a first-time Desmos user, fear not! Here’s what you need to do.

🔗Making Box Plots 30 minutes

Overview

Students are introduced to the notion of spread in a dataset. They learn about quartiles, box plots, and how to use them to talk about spread.

Launch

When we explored measures of center, we tried to determine the "typical" weight of animals at the shelter.

We determined that the average pet weighs almost 40 pounds.

But, how useful is this fact, really? Maybe all the pets weigh between 35 and 45 pounds, with every pet close to the mean. But maybe all the pets are super small or huge, and none are anywhere near the mean!

Once we have our summary for a "normal value", it’s likely we’ll ask: How typical is the average?
We’d expect some deviation - or spread - in any sample!

How do we measure the spread of a sample?

  • We can start by lining up all the animals' weights from smallest to largest.

  • We can compute the range of a dataset, by finding the distance between minimum and maximum values.

Note: the term “Range” means something different in statistics than it does in algebra and programming!

To learn more about how evenly distributed the data is we can:

  • Find the median, which splits the data in half

  • Split the data into four equal-sized quarters (by splitting each of these halves in half) and identify the quartiles (boundary points between these equal quarters).

  • Combine the quartiles with the minimum and the maximum to get a 5 Number Summary of the dataset:

    • Minimum: the smallest value in a dataset - it starts the first quarter

    • Q1 (lower quartile): the number that separates the first quarter of the data from the second quarter of the data

    • Q2: Median: the middle value (median) in a dataset

    • Q3 (upper quartile): the value that separates the third quarter of the data from the last

    • Maximum: the largest value in a dataset - it ends the fourth quarter of the data

  • Use the quartiles to calculate IQR (Interquartile Range), the distance spanned by the middle half of the data. IQR = Q3 - Q1

Investigate

To visualize the 5 number summary, the Range, and the Interquartile Range we can use box plots, which show how the four equal quarters of data are spread out along the number line.

A sample box-and-whisker plot based on contrived data

When we see that some of the sections are narrow and others are wider, we know that the narrow sections are packed more densely. They contain exactly as many points as the wider sections, but have less room for them to spread out!

  • Which quarter of data is packed the densest in this box plot?

    • The third one

  • Which quarter of the data is the most dispersed in this box plot?

    • The fourth one

When the points are evenly distributed, the four sections of the box plot will be equal in size, like this:

a box plot with four equal-size sectionsEven Distribution

Left and right skew are easy to identify from a quick glance at a box plot, with longer whiskers trailing off toward potential outliers.

a box plot with a longer whisker on the leftLeft Skew

a box plot with a longer whisker on the rightRight Skew

Sometimes there is roughly the same amount of variation on the low end as on the high end. For example, the distribution of newborns who are smaller than average might mirror that of newborns who are bigger than average. We call this kind of spread symmetric.

a box plot with equally long whiskers, and boxes that are narrower than the whiskers, but the same width as each otherSymmetric

To create a box plot in CODAP, create a graph of randomly distributed points, then drag a quantitative column to the x-axis. From the Measure menu, select Box Plot. If this information is not on your Data Displays Organizer, add it now!

Box plots divide our sample into four equally populated groups, and show which of those groups are spread wide or are tightly packed.

Create a box plot in CODAP that displays the spread of Pounds.

A box plot spanning from 0 to 172, whose box spans from about 3 ot 65 with the median falling around 12

  • What conclusions can you draw about the distribution of values in this column?

    • While the animals' weights range from 0.1 pounds to 172 pounds, 50% of the animals weigh 11.3 pounds or less. The animal that weighs 172 pounds may be an outlier.

  • If Q1 is the value for which 25% of the animals weighed that amount or less, what does Q3 represent?

    • The third quartile is the value for which 75% of the animals weighed that amount or less. Another way of saying that would be that it is the value for which 25% of the animals weigh that amount or more.

  • Could we make a box plot for every column in the data set?

    • No. We can only make box plots for quantitative columns.

  • Why do you think this display is sometimes called a "box and whisker plot"?

    • The distance between Min/Q1 and Q3/Max is drawn like whiskers!

If students are struggling to write conclusions, go over the following five number summary from the box plot they made.

  • Minimum (the left “whisker”) - the smallest value in the dataset . In our dataset, that’s just 0.1 pounds.

  • Q1 (the left edge of the box) - computed by taking the median of the lower half of the values. In the pounds column, that’s 3.9 pounds.

  • Q2 / Median value (the line in the middle), which is the middle Quartile of the whole dataset. We already computed this to be 11.3 pounds.

  • Q3 (the right edge of the box), which is computed by taking the median of the upper half of the values. That’s 60.4 pounds in our dataset.

  • Maximum (the right “whisker”) - the largest value in the dataset . In our dataset, that’s 172 pounds.

Choose another quantitative column to summarize and complete the second half of Summarizing Columns with Measures of Spread

Common Misconceptions

It is extremely common for students to forget that the quartiles divide the data into quarters, each of which includes 25% of the dataset. This will need to be heavily reinforced.

Synthesize

  • What percentage of points fall in the first quarter?

    • 25%

  • What percentage of points fall in the second quarter?

    • 25%

  • What percentage of points fall in the third quarter?

    • 25%

  • What percentage of points fall in the fourth quarter?

    • 25%

  • What percentage of points fall in the Interquartile Range (IQR)?

    • 50%

  • What percentage of points fall within the Range?

    • 100%

🔗Interpreting Box Plots 30 minutes

Overview

Students learn how to read a box plot, connecting this visualization of spread to what they know about histograms.

Launch

Box plots and histograms give us two different views of the shape of quantitative data.

Intervals Data points per Interval

Box Plots

Variable

Fixed - 25% of the data in each Interval

Histograms

Fixed Bins

Variable - Points “pile up in bins”, so we can see how many are in each.

In histograms, skewness shows up as a long tail of shorter bars to one side.

In a box plot skewness is seen as a longer "whisker" or more spread in one half of the box.

Kinesthetic Activity

Divide the class into groups, and give each group a ruler and a ball of play-dough. Have them draw a number line from 0-6 with the ruler, marking off the points at 0, 3, 4, 4.5 and 6 inches. Have the groups roll the dough into a thick cylinder, divide that cylinder in half, and then split each half to form four equally-sized cylinders. The play-dough represents a sample, with values divided into four quarters.

Box plots stretch and squeeze these equal quarters of the data across a number line, so that they fit into their respective intervals. On their number line, students have intervals from 0-3, 3-4, 4-4.5, and 4.5-6. Have students shape their cylinders into rectangles that fill each of these intervals, and are all about 1 inch thick.

Students should notice that the play-dough is taller for shorter intervals and thinner for longer intervals. Even though a box plot doesn’t show us the thickness of the data points, we know that a small interval has the same amount of data "squeezed" into it as a large interval has spread across it.

Investigate

Modified Box Plots

More Statistics- or Math-oriented classes will also be familiar with modified box plots (video explanation), which remove outliers from the box-and-whisker and draw them as asterisks outside of the plot.

In CODAP, you can create a modified Box Plot by selecting both Box Plot and Show Outliers from the Measure menu.

Now that you have the skills to interpret box plots, complete Data Cycle: Shape of the Animals Dataset.

Synthesize

Now that we’ve explored the spread of the dataset, do you think the mean is the best measure of center for the animals' weights?

🔗Data Exploration Project (Box Plots) flexible

Overview

Students apply what they have learned about box plots to their chosen dataset. They will add three items to their Data Exploration Project Slide Template: (1) at least two box plots, (2) the corresponding five-number summaries, and (3) any interesting questions they develop.

To learn more about the sequence and scope of the Exploration Project, visit Project: Dataset Exploration. For teachers with time and interest, Project: Research Capstone is an extension of the Dataset Exploration, where students select a single question to investigate via data analysis.

Launch

Let’s review what we have learned about making and interpreting box plots.

  • Does a box plot display categorical or quantitative data? How many columns of data does a box plot display?

    • Box plots display a single column of quantitative data.

  • How are box plots similar to histograms? How are they different?

    • Box plots and histograms give us two different views on the concept of shape. Histograms have fixed intervals ("bins") with variable numbers of data points in each one. Box plots have variable intervals ("quartiles") with a fixed number of data points in each one.

  • A box plot lets us visualize the five-number summary. What does the five-number summary tell us about the column of data?

    • The five-number summary includes the minimum, medium, and maximum. It also includes the median of the lower half of the values, and the median of the upper half of the data points.

Investigate

Let’s connect what we know about box plots to your chosen dataset.

Students have the opportunity to choose a dataset that interests them from our List of Datasets in the Choosing Your Dataset lesson.

  • Open your chosen dataset starter file in CODAP.

  • Remind yourself which two columns you investigated in the Measures of Center lesson and make a box plot for one of them.

  • What question does your display answer?

    • Possible responses: How is the data for a certain column distributed? Are the values close together or really spread out? Are there any outliers?

  • Now, write down that question in the top section of Data Cycle: Shape of My Dataset

  • Then, complete the rest of the data cycle, recording how you considered, analyzed and interpreted the question.

  • Repeat this process for the other column you explored before (and any others you are curious about).

If students want to investigate new columns from their dataset, they will need to copy/paste additional Measures of Center and Spread slides into their Exploration Project and calculate the mean, median and modes for the new columns.

Confirm that all students have created and understand how to interpret their box plots. Once you are confident that all students have made adequate progress, invite them to access their Data Exploration Project Slide Template from Google Drive.

  • It’s time to add to your Data Exploration Project Slide Template.

  • Find the box plot slide in the "Making Displays" section and copy/paste your first box plot here. Duplicate the slide to add your other box plots.

  • Add the five-number summaries from these plots to the corresponding "Measures of Center and Spread" slides.

  • Be sure to also add any interesting questions that you developed while making and thinking about box plots to the "My Questions" slide at the end of the deck.

Synthesize

  • What shape did you notice in your box plots?

  • Did you discover anything surprising or interesting about your dataset?

  • What, if any, outliers did you discover when making box plots?

  • When you compared your findings with others, did you make any interesting discoveries? (For instance: Did everyone find outliers? Was there more or less similarity than expected?)

🔗Additional Exercises

These materials were developed partly through support of the National Science Foundation, (awards 1042210, 1535276, 1648684, 1738598, 2031479, and 1501927). CCbadge Bootstrap by the Bootstrap Community is licensed under a Creative Commons 4.0 Unported License. This license does not grant permission to run training or professional development. Offering training or professional development with materials substantially derived from Bootstrap must be approved in writing by a Bootstrap Director. Permissions beyond the scope of this license, such as to run training, may be available by contacting contact@BootstrapWorld.org.