Correlations

email twitter instagram facebook

Lessons

Standards in this Lesson

Common Core Math Standards

8.SP.A.1: Construct and interpret scatter plots for bivariate measurement data to investigate patterns of association between two quantities. Describe patterns such as clustering, outliers, positive or negative association, linear association, and nonlinear association.
8.SP.A.2: Know that straight lines are widely used to model relationships between two quantitative variables. For scatter plots that suggest a linear association, informally fit a straight line, and informally assess the model fit by judging the closeness of the data points to the line.
HSS.ID.B.6: Represent data on two quantitative variables on a scatter plot, and describe how the variables are related.
HSS.ID.C.8: Compute (using technology) and interpret the correlation coefficient of a linear fit.
HSS.ID.C.9: Distinguish between correlation and causation.

CSTA Standards

1B-DA-06: Organize and present collected data visually to highlight relationships and support a claim.
2-DA-09: Refine computational models based on the data they have generated.
3B-NI-05: Use data analysis tools and techniques to identify patterns in data representing complex systems
3B-NI-07: Evaluate the ability of models and simulations to test and support the refinement of hypotheses.

K-12CS Standards

6-8.Data and Analysis.Visualization and Transformation: Computer models can be used to simulate events, examine theories and inferences, or make predictions with either few or millions of data points. Computer models are abstractions that represent phenomena and use data and algorithms to emphasize key features and relationships within a system. As more data is automatically collected, models can be refined.

Oklahoma Standards

OK.L1.DA.IM.01: Show the relationships between collected data elements using computational models.
OK.PA.D.1.3: Collect, display and interpret data using scatterplots. Use the shape of the scatterplot to informally estimate a line of best fit, make statements about average rate of change, and make predictions about values not in the original data set. Use appropriate titles, labels and units.

Textbook Alignment

IM Algebra 1

IM.Alg1.3.8: Using the Correlation Coefficient
IM.Alg1.3.7: The Correlation Coefficient
IM.Alg1.3.5: Fitting Lines

IM 8 Math™

IM.8.6.5: Describing Trends in Scatter Plots
IM.8.6.4: Fitting a Lin to Data

Connected Math

CMP.8.1: Thinking with Mathematical Models: Linear and Inverse Variations

Practices in this Lesson

Science and Engineering

SEP.3: Planning and Carrying Out Investigations

Math Lang. Routines

MLR.7: Compare and Connect

Math

MP.4: Model with mathematics
MP.3: Construct viable arguments and critique the reasoning of others

Students deepen their understanding of scatter plots, learning to describe and interpret direction and strength of linear relationships.

Lesson Goals

Students will be able to…

Confirm if a scatter plot appears linear
Understand how correlation assesses direction in a linear relationship
Understand how correlation measures strength in a linear relationship

Student-facing Lesson Goals

Let’s explore scatter plots and what they can tell us about data relationships.

Materials

Preparation

All students should log into code.pyret.org (CPO) and open their saved "Animals Starter File". If they don’t have the file, they can open a new one from Animals Starter File.

Supplemental Resources

Glossary

correlation: a single number somewhere between -1 and +1 that reports the direction and strength of the linear relationship between two quantitative variables (also known as the r-value)
direction: the relationship between two quantitative variables: either they increase/decrease together or one may increase while the other decreases
form: the shape of a relationship between two quantitative variables: whether the two variables together vary linearly or in some other way
linear regression: a type of analysis that models the relationship between two quantitative variables. The result is known as a regression line, or line of best fit.
linear relationship: sequences that change at a constant rate, or points forming a straight line on a graph
r: a number between −1 and 1 that measures the direction and strength of a linear relationship between two quantitative variables (also known as correlation value)
strength: of a relationship between two quantitative variables: how much the value of one variable tells us about the value of the other

🔗Correlations have Form 5 minutes

Overview

Students identify and make use of patterns in scatter plots, learning to characterize them as being linear, curved, or showing no clear pattern. Determining that a form is linear is a prerequisite for proceeding to correlation and linear regression.

Launch

Students have learned several ways to analyze a single quantitative variable, such as age or pounds of the animals in our dataset:

reporting the center
computing on the spread
describing the shape of the distribution

Together, those numbers tell us what value is typical, how much the values vary, and what kind of values are usual or unusual.

But those analyses tell us nothing about the relationship between animals' ages and weights. In order to understand such relationships, we have to expand our view from one column to two. This goes hand-in-hand with expanding our display from a 1-dimensional histogram or box plot to a 2-dimensional scatter plot.

Rather than summarizing each distribution in one dimension, we can search for a linear relationship between two quantitative variables. But linear relationships only make sense if the scatter plot follows a straight-line pattern. So the first thing we need to ask is whether the form of the relationship as being linear or not.

Form indicates whether a relationship is linear, non-linear or undefined.

Investigate

Some patterns are linear, and cluster around a straight line sloping up or down. A scatter plot showing a linear (straight-line) relationship 🖼Show image

A scatter plot showing no relationship 🖼Show imageSome patterns are non-linear, and may look like a curve or an arc.

A scatter plot showing a non-linear (curved) relationships 🖼Show imageAnd sometimes there is no relationship or pattern at all!

Turn to Identifying Form, Direction and Strength, and complete just the first question for each scatter plot, identifying whether the relationship is linear, non-linear or if there’s no relationship at all.

Synthesize

Which scatter plots seem to have linear relationships?
- A, C, D, and F seem to have linear relationships.
Which scatter plots seem to have non-linear relationships?
- Scatter plot E seems to have a non-linaer relationship.
Which scatter plots seem to have no relationships?
- Scatter plot B seems to have no relationship.

Data Scientists use their eyes all the time! It doesn’t make sense to search for correlations when there’s no pattern at all, and summarizing with a correlation only makes sense for linear relationships!

Going Deeper

In an AP Statistics class or full-year Data Science class, it’s appropriate to discuss non-linear relationships here. In a dedicated computer science class, it may also be appropriate to talk about transforming the x- or y-axis (using .build-column!) via a quadratic, exponential, or logarithmic function and then looking for a linear pattern in the resulting scatter plot. All of these are extensions to the materials presented here.