Scatter Plots

email twitter instagram facebook

Students investigate scatter plots as a method of visualizing the relationship between two quantitative variables.

Prerequisites

Displaying Categorical Data

Relevant Standards

Select one or more standards from the menu on the left (⌘-click on Mac, Ctrl-click elsewhere).

Common Core Math Standards

8.SP.A.1: Construct and interpret scatter plots for bivariate measurement data to investigate patterns of association between two quantities. Describe patterns such as clustering, outliers, positive or negative association, linear association, and nonlinear association.
8.SP.A.2: Know that straight lines are widely used to model relationships between two quantitative variables. For scatter plots that suggest a linear association, informally fit a straight line, and informally assess the model fit by judging the closeness of the data points to the line.
HSS.ID.B.6: Represent data on two quantitative variables on a scatter plot, and describe how the variables are related.

CSTA Standards

3A-DA-11: Create interactive data visualizations using software tools to help others better understand real-world phenomena.
3B-NI-05: Use data analysis tools and techniques to identify patterns in data representing complex systems

K-12CS Standards

6-8.Data and Analysis.Visualization and Transformation: Computer models can be used to simulate events, examine theories and inferences, or make predictions with either few or millions of data points. Computer models are abstractions that represent phenomena and use data and algorithms to emphasize key features and relationships within a system. As more data is automatically collected, models can be refined.
9-12.Data and Analysis.Visualization and Transformation: Data can be transformed to remove errors, highlight or expose relationships, and/or make it easier for computers to process.
P5: Creating Computational Artifacts

Next-Gen Science Standards

HS-SEP6-1: Make a quantitative and/or qualitative claim regarding the relationship between dependent and independent variables.

Oklahoma Standards

OK.L1.DA.IM.01: Show the relationships between collected data elements using computational models.
OK.PA.D.1.3: Collect, display and interpret data using scatterplots. Use the shape of the scatterplot to informally estimate a line of best fit, make statements about average rate of change, and make predictions about values not in the original data set. Use appropriate titles, labels and units.

Lesson Goals

Students will be able to…

consider explanatory and response roles of variables
make scatter plots by hand, given a list of (x,y) pairs
make scatter plots using Pyret
identify a possible linear relationship by looking at a point cloud
Consider unusual observations in a scatter plot

Student-facing Lesson Goals

Let’s use Pyret to create scatter plots of data.

Materials

Preparation

Make sure all materials have been gathered
Decide how students will be grouped in pairs
Computer for each student (or pair), with access to the internet
Student workbook, and something to write with
All students should log into CPO and open the "Animals Starter File" they saved from the prior lesson. If they don’t have the file, they can open a new one

Supplemental Resources

Language Table

Types

Functions

Values

Number

num-sqrt, num-sqr, mean, median, modes

4, -1.2, 2/3

String

string-repeat, string-contains

"hello", "91"

Boolean

==, <, <=, >=, string-equal

true, false

Image

triangle, circle, star, rectangle, ellipse, square, text, overlay, bar-chart, pie-chart, bar-chart-summarized, pie-chart-summarized, histogram

🔵🔺🔶

Table

count, .row-n, .order-by, .filter, .build-column

Glossary

explanatory variable: the variable in a relationship that is presumed to impact the other variable
response variable: the variable in a relationship that is presumed to be affected by the other variable
scatter plot: a display of the relationship between two quantitative variables, graphing each explanatory value on the x axis and the accompanying response on the y axis

🔗Relationships Between Columns 15 minutes

Overview

Students are finally introduced to Relate Questions, which ask about the relationship between one quantitative column and another.

Launch

Can animals' weights help explain why some are adopted quickly while others take a long time? What other factors explain why one pet gets adopted right away, and others wait months?

Theory 1: Smaller animals get adopted faster because they’re easier to care for.

How could we test that theory? Bar and pie charts are great for showing us frequencies or percentages in a categorical column. Histograms and box plots are great for showing us the shape, center, and spread of a single quantitative column. But none of these displays will help us see connections between two quantitative columns.

Investigate

Take a few minutes to look through the whole dataset, and see if you agree with Theory 1.
Could any of our visualizations or summaries provide evidence for or against the theory?
Write down your hypothesis on (Dis)Proving a Claim (Page 71), as well as a theory about how we could use this dataset to see if you’re right.

Synthesize

We’ve got a lot of tools in our toolkit that help us think about an entire column of a dataset:

We have ways to find measures of center and spread for a given quantitative column.
We have visualizations that let us see the shape of values in a quantitative column.
We have visualizations that let us see frequencies or percentages in a categorical column.

What columns is this question asking about?

🔗Making Scatter Plots 20 minutes

Overview

Students are introduced to scatter plots, which are visualizations tailored to Relate Questions about quantitative variables. They learn how to construct scatter plots by hand, and in Pyret.

Launch

This question is asking about two columns in our dataset. Specifically, it’s asking if there is a relationship between pounds and weeks.

Before we can draw a scatter plot, we have to make an important decision: which variable is explanatory and which is response? In this case, are we suspecting that an animal’s weight can explain how long it takes to be adopted, or that how long it takes to be adopted can explain how much an animal weighs?

The first of these makes sense, and reflects our suspicion that weight plays a role in adoption time. The convention is to use the horizontal axis for our explanatory variable and the vertical axis for the response. Thus, pounds will be x and weeks will be y.

Investigate

We will produce our scatter plot by graphing each animal’s pounds and weeks values as a point on the x and y axes.

Complete Creating a Scatter Plot (Page 72) in your Student Workbook.

Teaching Tip

Divide the full table up into sub-lists, and have a few students plot 3-4 animals on the board. This can be done collaboratively, resulting in a whole-class scatterplot!

Open your “Animals Starter File”. (If you do not have this file, or if something has happened to it, you can always make a new copy.)
Make a scatter plot that displays the relationship between weight and adoption time.
Are there any patterns or trends that you see here?
Try making a few other scatter plots, looking for relationships between other columns in the animals-table.

Synthesize

Have students share their observations. What trends do they see? Are there any points that seem unusual? Why?

🔗Looking for Trends 20 minutes

Overview

Students are asked to identify patterns in their scatter plots. This activity builds towards the idea of linear associations, but does not go into depth (as the following lesson does).

Launch

Shown below is a scatter plot of the relationships between the animals' age and the number of weeks it takes to be adopted.

🖼Show image

Can you see a “cloud” around which the points are clustered?
Does the number of weeks to adoption seem to go up or down as the weight increases?
Are there any points that “stray from the pack”? Which ones?

Teaching Tip

Project the scatter plot at the front of the room, and have students come up to the plot to point out their patterns.

A straight-line pattern in the cloud of points suggests a linear relationship between two columns. If we can pinpoint a line around which the points cluster (as we’ll do in a future lesson), it would be useful for making predictions. For example, our line might predict how many weeks a new dog would wait to be adopted, if it weighs 68 pounds.

Do any data points seem unusually far away from the main cloud of points? Which animals are those? These points are called unusual observations. Unusual observations in a scatter plot are like outliers in a histogram, but more complicated because it’s the combination of x and y values that makes them stand apart from the rest of the cloud.

Unusual observations are always worth thinking about

Sometimes they’re just random. Felix seems to have been adopted quickly, considering how much he weighs. Maybe he just met the right family early, or maybe we find out he lives nearby, got lost and his family came to get him. In that case, we might need to do some deep thinking about whether or not it’s appropriate to remove him from our dataset.
Sometimes they can give you a deeper insight into your data. Maybe Felix is a special, popular (and heavy!) breed of cat, and we discover that our dataset is missing an important column for breed!
Sometimes unusual observations are the points we are looking for! What if we wanted to know which restaurants are a good value, and which are rip-offs? We could make a scatter plot of restaurant reviews vs. prices, and look for an observation that’s high above the rest of the points. That would be a restaurant whose reviews are unusually good for the price. An observation way below the cloud would be a really bad deal.

Investigate

For practice, try making scatter plots for each of the following relationships, always expressed as “response variable vs explanatory variable”. If you see any unusual observations, try to explain them!

The pounds of an animal vs its age
The number of weeks for an animal to be adopted vs its number of legs
The number of legs vs the age of an animal.
Do you see a linear (straight-line) relationship in any of these, evidenced by a cloud of points that’s clearly rising or falling from left to right? Are there any unusual observations?

Synthesize

Debrief, showing the plots on the board. Make sure students see plots for which there is no relationship, like the last one!

Theory 2: Younger animals get adopted faster because they are easier to care for.

It might be tempting to go straight into making a scatter plot to explore how weeks to adoption may be affected by age. But different animals have very different lifespans! A 5-year-old tarantula is still really young, while a 5-year-old rabbit is fully grown. With differences like this, it doesn’t make sense to put them all on the same scatter plot. By mixing them together, we may be hiding a real relationship, or creating the illusion of a relationship that isn’t really there! What should we do to explore this theory?

These materials were developed partly through support of the National Science Foundation, (awards 1042210, 1535276, 1648684, and 1738598). Bootstrap:Data Science by Emmanuel Schanzer, Nancy Pfenning, Emma Youndtsmith, Jennifer Poole, Shriram Krishnamurthi, Joe Politz, Ben Lerner, Flannery Denny, and Dorai Sitaram with help from Eric Allatta and Joy Straub is licensed under a Creative Commons 4.0 Unported License. Based on a work at www.BootstrapWorld.org. Permissions beyond the scope of this license may be available by contacting schanzer@BootstrapWorld.org.