instagram

Students use linear models to investigate relationships in demographic data about US states using an inquiry-based approach, involving hypothesizing, experimental and computational modeling, and sense-making.

Lesson Goals

Students will be able to…​

  • Read and interpret real-world data, presented in a scatter plot

  • Describe correlations as Strong, Moderate, or Weak

  • Model linear relationships using linear functions

Student-facing Lesson Goals

  • Let’s use Pyret to make predictions with linear models

Materials

Supplemental Materials

Key Points For The Facilitator

  • The math content of this lesson is primarily a review from Algebra 1.

  • This lesson introduces necessary programming skills while students practice exploring linear relationships.

  • This lesson establishes a structure for exploring relationships in data that students will make use of in subsequent explorations:

    • build a model from samples

    • fit a model

    • improve upon the model

  • Two of the starter files referenced in this file pull data from the same spreadsheet.

🔗Looking for Patterns 45 minutes

Overview

Students explore the State Demographics dataset and, building on a discussion of the displays they previously made using the animals dataset, recognize the unique opportunity scatter plots offer for exploring relationships between columns.

Launch

We’re going to search for relationships within a dataset about all the states in the US. But first, let’s take a moment to (1) develop confidence in our ability to use functions for working with tables and making displays, and (2) build familiarity with a new dataset that we are going to spend a lot of time with.

  • What did you Notice and Wonder?

  • What did you learn about defining rows in Pyret?

    • Example: x = row-n(states-table, 0) will make the name x have the value of the first row in the table (the index starts at zero!).

  • How would define a name y to be the value of the second row in the table? The third?

    • y = row-n(states-table, 1) for the second row. Change the 1 to a 2 for the third.

In math, π‘₯ = 4 will define a variable π‘₯ to be the value 4.

Any time we see π‘₯ after that, we can substitute in the value of 4.

This works in Pyret, too. But in Pyret, values can be more than just numbers!

In this file, the variables alabama and alaska are defined as rows from the table.

Debrief the rest of the page with students. Then, initiate a conversation about the various column names, ensuring that students understand all of the terminology. Later in the lesson, students will examine relationships between income and education. We recommend posing the questions below to ensure that they are ready to do so.

  • What columns in this dataset have to do with wealth?

    • pct-in-poverty, poverty-rate, median-income, per-capita-income

  • What columns are about education levels?

    • college-or-higher, hs-or-higher

Investigate

Before we dig deeper into the State Demographics data, let’s think back to the animals at the shelter in order to introduce some new data science concepts.

  • Would you imagine that younger animals get adopted faster than older animals? Why or why not?

    • The goal here is to have an open discussion and draw students in. Allow students to share their opinions freely.

    • (For example: Yes, baby animals get adopted quickly because they’re so cute! No, animals require too much work when they are young.)

  • What kind of data does the age variable represent? What about weeks?

    • Both age and weeks are quantitative variables.

  • What kind of display would help us analyze the relationship between age and adoption time?

    • Again, solicit student’s ideas and discuss why each display type would or would not work.

Pie and bar charts help us see the frequency of values in a single categorical column.

There are other displays, like histograms and box plots, that help us explore the distribution of values in a single quantitative column.

But what we really want is a display that will help us search for a relationship between two quantitative columns, and that’s exactly what scatter plots do!

Scatter plots reveal the relationship between two columns by plotting one on the x-axis and the other on the y-axis.

Before we can draw a scatter plot, we have to make an important decision: which variable do we think of as the cause - called the explanatory variable - and which is the effect (response variable)?

In this case, which do we suspect is the cause and which is the effect: age or time-to-adoption?

We suspect that age affects the adoption time, so we’ll use age as our explanatory variable and weeks as our response variable.

Why not Independent/Dependent?

Data Science relies on concepts in Probability. When discussing events in Probability, we may say two events are dependent or they are independent. For instance, we might say having a job and having a college education are dependent because one affects the probability of the other occurring. When discussing relationships in Data Science, we want to go further than just saying two things are connected: We want to consider if they are causally connected: one of them affects the other, but not vice versa. Therefore, we think of one variable as being explanatory and the other as the response variable. For instance, having a college education would be our explanatory variable, and having a job would be the response.

It’s customary to use the horizontal axis for our explanatory variable and the vertical axis for the response variable. Each row in the dataset will be represented by a point on the scatter plot with age for π‘₯ and weeks for 𝑦.

  • It’s time to dig back into the State Demographics data.

  • Which states do you want to focus on? (Pick our state, a neighboring state, and/or a state you’ve always wanted to visit!)

If students aren’t familiar with neighboring states, here’s a useful map!

Come to a consensus about which states your students will explore. When more students are looking into the same data, you’ll find much richer class discussions!

Encourage students to first think about which columns might be related, and then create the scatter plot to search for this relationship, rather than making scatter plots for random pairs of columns. The dataset is designed so that students will quickly begin searching for relationships between varying levels of education and income, and there are linear relationships in each of these.

Exploring the States Dataset

The Preview: State Demographics Starter File has a lot of interesting data, and endless possible combinations of columns to explore. But randomly smashing columns together in a scatter plot is not the habit we want students to cultivate! Instead, make sure students are actually talking with their partners about why two columns may or may not be related.

Making sense: can students predict these relationships, and explain their thinking?
(If so, probably not worth having them spend time on more than one of them!)

  • pop-2010 vs. pop-2020.

  • pop-2020 vs. num-households

  • num-housing-units vs. num-households

  • num-households vs. num-veterans

Surprises in the District of Columbia: DC often shows up as an outlier or extreme value. But why? Here are a few relationships to spark students' interest.

  • pct-college-or-higher vs. pct-in-poverty

  • median-income vs. pct-college-or-higher

  • median-income vs. pct-home-owners

  • pct-college-or-higher vs. pct-home-owners

  • pct-college-or-higher vs. pct-home-owners

  • pct-home-owners, num-housing-units

  • median-income vs. per-capita-income

Synthesize

  • Share your scatter plots with one another. (Perhaps by copying and pasting scatter-plot displays into a shared document and then labeling those displays?)

  • What possible relationships did you find?

  • What did you learn about the state(s) you decided to focus on?

  • Did you and your classmates use similar words to describe the scatter plots you came up with? If so, what were they?

Note: Students will acquire the formal vocabulary that data scientists use to assess relationships in the next section of this lesson, which is all about identifying form, direction, and strength.

🔗Describing Patterns 45 minutes

Overview

Students identify and make use of correlations in scatter plots. They learn to characterize their form as being linear, curved, or showing no clear pattern. They learn that linear patterns have direction, and they learn how to report strength (as well as direction) with a number called the "correlation."

Launch

Scatter plots let us visualize the relationship between two quantitative columns. If no relationship exists, the points in the scatter plot just appear as a shapeless cloud. But if there is a relationship, the points will form some kind of pattern. When we build scatter plots, we are searching for patterns between two quantitative variables.

These patterns can be described by three terms: form, direction and strength.

Form

A scatter plot showing a linear (straight-line) relationship

A scatter plot showing a nonlinear (curved) relationships

A scatter plot showing no relationship

Some patterns are linear, and cluster around a straight line sloping up or down.

Some patterns are nonlinear, and may look like some kind of curve.

And sometimes there is no relationship or pattern at all!

Form indicates whether a relationship is linear, nonlinear or undefined.

Have students turn to Linear, Non-linear, or Bust? and decide whether each of the scatter plots could be modeled by a linear relationship, a nonlinear relationship, or that there doesn’t appear to be a pattern.

Direction

If the relationship clusters around a straight line, we can talk about direction.

A scatter plot, having a tight point cloud with a positive slopePositive: The line slopes up as we look from left-to-right. Positive relationships are by far the most common because of natural tendencies for variables to increase in tandem. For example, β€œthe older the animal, the more it tends to weigh”.

a scatter plot, with a tightly-clumped point cloud with a negative slopeNegative: The line slopes down as we look from left-to-right. For example, β€œthe older a child gets, the fewer new words he or she learns each day.”

Only linear relationships have direction.

Not every shape has a direction! For example, a curve can start out sloping upwards, but then peak and slope downwards.

Strength

a scatter plot, with a tightly-clumped point cloud with a negative slopeA relationship is strong if knowing a data point’s x-value gives us a very good idea of what its y-value will be (knowing a student’s age gives us a very good idea of what grade they’re in). A strong linear relationship means that the points in the scatter plot are all clustered tightly around an invisible line.

a scatter plot, with a loosely-clumped point cloud with a negative slopeA relationship is weak if x tells us little about y (a student’s age doesn’t tell us much about their number of siblings). A weak linear relationship means that the cloud of points is scattered very loosely around the line.

Strength indicates how closely the two variables are related.

Strength indicates how closely the two variables are related.

Investigate

Now that you’ve dug into the role that form, direction and strength play in assessing a relationship between two quantitative variables, it’s time to put those concepts to work!

  • We need to train our eyes to look for form so that we know whether we’re justified in fitting a line to the scatter plot and reporting a correlation, neither of which would be appropriate if the form is non-linear.

  • Let’s start by practicing matching scatter plots to their descriptions on Identifying Form, Direction and Strength (Matching).

  • Then turn to Identifying Form, Direction and Strength and work with your partner or group to describe each scatter plot.

  • You may want to review the matching answers before having students complete the second page.

  • For students who are struggling, hearing what their peers are looking for is especially helpful at this stage, so be sure to have students explain their thinking for these activities.

  • Some of the answers are not so clear-cut, and students may disagree about what constitutes a "strong" vs. "weak" correlation. We’ve tried to choose scatter plots that clearly fall into one category or the other, but without diving into the algorithm for linear regression students may find this exercise somewhat subjective…​ and that’s ok!

Return to Looking for Patterns, and apply what you’ve learned about Form, Direction and Strength to complete Part 2.

Common Misconceptions

  • Students often conflate strength and direction, thinking that a strong correlation must be positive and a weak one must be negative.

  • Students may also falsely believe that there is ALWAYS a correlation between any two variables in their dataset.

  • Students often believe that strength and sample size are interchangeable, leading to mistaken assumptions like "any correlation found in a million data points must be strong!" Or "there are only a few data points, so the relationship must be weak!" (Sample size only plays a role if we’re trying to generalize to what’s true for a larger population.)

Synthesize

  • What relationships did you explore in the states dataset?

  • Which appeared to have strong correlations? Were they positive or negative?

  • Were any of these relationships a surprise? Why or why not?

🔗Building Linear Models 45 minutes

Overview

Building on prior knowledge of linear functions, students learn to find the line of best fit to model the relationship in a scatter plot that looks linear. This yields a predictor function that tells what y-value to expect for a given x-value. Students also learn how to quantify how closely a model fits a dataset, using residuals and S as a measure of how well their models fit the data.

Launch

Before we learn to fit linear models to scatter plots, let’s review. What do you remember about linear functions?

We’d expect students to be able to surface much of the following:

  • Linear functions look like straight lines.

  • Vertical lines are not functions, because their slope is undefined as a result of their horizontal change being zero.

  • The steepness of a line can be described by its slope (or constant rate of change).

  • The slope can be calculated from any two points.

  • Students may remember the slope as $$\displaystyle \frac{change \; in \; y}{change \; in \; x}$$ or $$\displaystyle {rise}\over\displaystyle{run}$$ or $$\displaystyle \frac{y_2 - y_1}{x_2 - x_1}$$.

  • The point where the line crosses the y-axis is called the y-intercept.

  • The x-coordinate of the y-intercept always starts with zero, e.g. (0, 𝑦).

  • Diagonal lines have both a y-intercept and an x-intercept.

  • Horizontal lines have a constant rate of change of zero.

A table with columns for x (1,2,3,4) and y (5,7,9,11), and arrows showing what is added between the y-values (2,2,2,2).Linear relationships grow by fixed amounts, meaning that the difference between two y-values will always be the same over identical horizontal intervals. In the table shown to the right, you can see arrows pointing out the "jumps" between y-values for intervals of 1. Each jump is the same size.

If the rate of change is constant, the relationship is linear.

  • Try comparing intervals of 2, instead of intervals of 1.

  • Is the difference between y-values from π‘₯ = 1 to π‘₯ = 3 the same as the difference between y-values from π‘₯ = 2 to π‘₯ = 4?

    • Yes. When x increases by 2, y increases by 4.

Students are about to be asked to write the Slope-Intercept form of the line, given two points in our states dataset. If your students haven’t done much work with calculating slope and y-intercept from pairs of points recently, we recommend prepping them for success by having them complete Defining a Linear Function from Two Points.

Investigate

A scatter plot for all 50 states. The percentage of people in each state with a college degree or higher is shown on the x-axis, and the median household income on the y-axis. The point cloud shows a moderate, positive linear relationship This scatter plot appears to show a positive, linear relationship:
States with higher percentages of college graduates tend to have higher median household incomes.

A scatter plot for all 50 states. The percentage of people in each state with a college degree or higher is shown on the x-axis, and the median household income on the y-axis. The point cloud shows a moderate, positive linear relationship

  • What do you notice about the Form of this scatter plot? What pattern do you see?

    • This scatter plot appears to show a positive, linear relationship:
      States with higher percentages of college graduates tend to have higher median household incomes.

Suppose the United States were to add a new state.
Based on the data for the existing 50 states (plus DC!)…​

  • What median household income would you predict, if exactly 30% of the new state’s citizens had attended college?

    • Answers will vary. But should be above 50,000 and below 60,000

  • What would you predict if 20% had attended college?

    • Answers will vary. But should be around 40,000

  • If 40% had attended college?

    • Answers will vary. But should be upwards of 65,000

Screenshot of the right side of a Pyret scatter-plot where x-min, x-max, y-min, and y-max can be adjusted and Redrawn.Let students discuss, and explain their thinking.

  • If possible, mark off a single point for each of the hypothetical percentages, then connect those points to show a straight line.

  • Note that some of these new points would require changing the x-min, x-max, y-min and/or y-max of our display, which we can do by typing in the cells on the right side of the scatter plot and clicking "Redraw".

When we see patterns in data, we can use those patterns to make predictions based on that data. We can even draw a line to show all the possible predictions at once! These predictions represent our "best guess" at the underlying relationship in the data, as we try to model that relationship using math.

Let’s find a line to model the relationship between the percent of the population with college degrees and median income.

If your students could use more support for finding the equation of the line between two points, direct them to the scaffolded version of Build a Model from Samples: College Degrees v. Income (Scaffolded) instead.

Synthesize

Confirm that students were able to successfully compute slope and y-intercept, define and test al-ak(x) in Pyret, and test how well al-ak(x) predicted several states' median income given the percentage of the population with at least a college degree.

  • Why wasn’t the Alaska-Alabama model a good fit for the rest of the data?

    • Because Alaska is an outlier that falls pretty far above the line of best fit.

  • If we had chosen two other points from which to build our model, could we have done better? Which points did you choose, and why?

    • Answers will vary. But West Virginia and Massachusetts could be a good option.

Write those two states somewhere on Build a Model from Samples: College Degrees v. Income. You’ll want to remember them for later!

🔗Fitting Linear Models 45 minutes

Overview

Students confront the notion of "model fitness". How do we measure how well a model fits? How do we determine which of two models is best? First they’ll consider two models for a simple dataset and brainstorm how we could measure which fits better. Then they’ll test out their linear models using a new pyret function called fit-model, which draws the residuals and computes the Standard Deviation of the Residuals (𝑆).

Launch

In the previous section, we came up with a linear model for the relationship between pct-college-or-higher and median-income, but it definitely wasn’t the best model.

How do we even measure how good a model is?

  • What criteria did you come up with for how to assess whether or not a model is a good fit for the data?

    • Answers will vary. Ideas might include:

    • The points should be as evenly distributed around the model as possible.

      • We could see how the number of points above the line and below the line compare.

      • We could measure the distance between the points and the line and try to make sure the average distance above is balanced with the average distance below.

  • How could we measure the distance between the data points and the linear model?

    • Answers will vary. Ideas might include:

    • By drawing vertical lines connecting each data points to the linear model.

    • By drawing horizontal lines connecting each data points to the linear model.

    • By drawing diagonal lines connecting each data points to the linear model.
      Push students to recognize that in order for this measurement to be useful they would have to be perpendicular to the linear model!

    • By drawing squares with one corner on the data point and the opposite corner on the linear model.

Pyret has a special function called fit-model that graphs whatever function we give it on top of a scatter plot of the dataset!

  • Take a look at the contract for fit-model in your contracts page.
    What is its Domain?

    • Like scatter-plot, it consumes columns for our labels, our π‘₯s, our 𝑦s…​ *additionally, it consumes a function*.

  • Open the Cheerios Starter File and click "Run" to test out fit-model with the dataset and functions you were just looking at.

  • What do you Notice? What do you Wonder?

fit-model(cheerios-table, "id", "day", "cheerios-on-the-floor", f)

fit-model(cheerios-table, "id", "day", "cheerios-on-the-floor", g)

A plot fitting the model f on top of the cheerios dataset showing the residuals between the data points and the model.

A plot fitting the model g on top of the cheerios dataset showing the residuals between the data points and the model.

scatterplot with a regression line. A vertical line is drawn between the predicted point on the line and the actual datapoint on the scatter plot, to show the size of the residual for that pointWhen you graph your model in Pyret, you can see that:

  • some of the points are close to the line ("real" 𝑦 is close to "predicted" 𝑦)

  • some points are quite far away ("real" 𝑦 is far from "predicted" 𝑦)

The difference between any real 𝑦 and predicted 𝑦 is called the residual, and it measures how far off that one point in the model is from the actual data.

  • There are three terms in the legend at the bottom. What do they refer to?

    • The blue line is the model.

    • The red dots are the data from the data set.

    • Residuals refer to the vertical black lines connecting the data points to the model, representing the distance between the data and the value the model predicts. They vary in length depending on how far above or below the model the data is situated.

  • Compare the fit-model display for f to the fit-model display for g. How are they similar? How are they different?

    • The x-axis goes from 0 to 10 for both of them.

    • The y-axis for g stops at 9. It goes up to 20 for f.

    • Both f and g have a blue line and red dots.

    • f has significantly more red dots below the blue line than above it.

    • The data points for g more or less fill the vertical space of the display, whereas for f there are only data points in the bottom half of the display.

There are 𝑆 and 𝑅2 values listed in the top left corner. You probably haven’t seen these terms before, but let’s see if we can figure out what they mean.

  • How do 𝑆 and 𝑅2 compare for the two models?

    • The values are positive for both models and both 𝑆 and 𝑅2 values are smaller for g than they are for f.

While the remainder of the lesson could be done using the Preview: State Demographics Starter File, you will see us refer to Fitting a Model: State Demographics Starter File from here on out. This file contains the same data, but the Definitions Area is set up to save you time. al-ak has been predefined and the other models students will be asked to define during the remainder of the lesson have been started for them.

Now is the time to make sure students Save a Copy of the file.

Heads up: Sometimes a value has too many digits to be displayed clearly. When this happens, Pyret will convert it to scientific notation. While students in an Algebra 2 class will likely have encountered scientific notation before, they may not recognize 8.23𝑒5 as 8.23 Γ— 105 . You should make sure they understand how to interpret this notation.

Pyret has a function that will compute S without drawing the graph. This may be useful, especially for students who are struggling with scientific notation: # S :: (Tablet, Stringlabel, Stringxs, Stringys, Number->Numbermodel) -> Number

  • Based on the S values of the plots you created on this page, what do you think S means?

    • Answers will vary, but students should have some sense of the idea that if one model has a lower S value than another model of the same data it indicates a better fit.

scatterplot with a regression line. A vertical line is drawn between the predicted point on the line and the actual datapoint on the scatter plot, to show the size of the residual for that point

There are many different tools to calculate the fitness of a model.

  • You may have heard of 𝑅, 𝑅2 , etc…​

  • Statisticians and Data Scientists are careful to use the right tool for the job.

For our purposes, we’ll use the value of S to tell us how well or poorly our model fits.

scatterplot with a regression line. A vertical line is drawn between the predicted point on the line and the actual datapoint on the scatter plot, to show the size of the residual for that point

The statistical term S refers to the Standard Deviation of the Residuals, which is a measure of how far away all of data points are from a model.

  • The closer the data points are to the model, the smaller the residuals are. If a data point falls directly on the model, the residual is zero!

  • Smaller residuals mean a smaller S, and a better model!

  • We know that if a model fits the data perfectly, the S value would be 0.

  • Unlike other statistical measures, there is no maximum value of S, so an S value of 300 tells us something different about how well our model fits the data depending on the range of the data.

The 𝑆-value always has to be considered in the context of the range of values that the model is predicting!

A model built from Alaska and Alabama predicts that a 1 percent increase in college degrees is associated with a $5613.67 increase in median household income.

  • The lowest median incomes are found in Mississippi ($39.031), Arkansas ($40,768), and West Virginia ($41,043).

  • The highest median income is found in Maryland ($73,538).

With an S-value of 36165, we know that there’s enough error in the model to predict median incomes that are off by $36,165! That’s enough to double the median income of a state or cut it in half!

Compared to the size of the incomes in this dataset, an S value of $36,165 is pretty terrible. This model should not be trusted!

  • Were any of the models described terrific? How do you know?

    • Both 2 and 8

    • Because the numbers in the range were huge and the 𝑆 value was really small.

  • Were any of the models described terrible? How do you know?

    • Both 1 and 6

    • Because the 𝑆-value was big in comparison to the range.

    • For the first scenario the 𝑆-value was 300, which was the majority of the range between 0 and 400.

    • For the sixth scenario, even though the 𝑆-value was only 1, it was much bigger than any of the numbers in the range, which maxed out at two hundredths.

Going Deeper

For a discussion of why the standard error of the regression 𝑆 may provide more useful information than 𝑅2 , we recommend visiting this link. Further discussion of S and Residuals may be appropriate for older students, or in an AP Statistics class. We also have an entire Bootstrap:Data Science lesson on Standard Deviation.

Synthesize

  • What does it mean if 𝑆 is zero?

    • The model fits the data perfectly.

  • What does it mean if 𝑆 is 300?

    • We have no way of knowing out of context! 𝑆-values only make sense when considered in the context of the range of the dataset!

🔗Making Sense of our Best Linear Models

Overview

Students are introduced to a new pyret function called lr-plot, which uses linear regression to fit the best possible linear model to the data.

Launch

We’ve learned how to measure how well linear models fit the data and to decide which linear model does a better job of predicting values, but how do we find the best possible linear model?

In Statistics, an algorithm called linear regression is used to derive the slope and y-intercept of the best possible model by taking every datapoint into account.

Pyret has a function called lr-plot that will do just that.

More lr-plot material

If you’d like to have students dig deeper into linear regression, there’s an entire lesson you can use that spends more time interpreting results and writing about findings. This lesson also includes a discussion of 𝑅2 , a different measure of model fitness.

Investigate

  • How close did your models come to the optimal model?

  • Did anything about the optimal model surprise you?

Models are only useful if know how to use them!

  • Turn to the second section of Optimizing and Interpreting Linear Models.

  • Using the interpretation of the al-ak model you’ll find there as a guide, write up your interpretation of the optimal model you just found for this dataset. Then answer the questions that follow.

  • For more practice, build linear models for other relationships in the data. You can use Building More Linear Models.

Optional Activity: Guess the Model!

  1. Divide students into teams of 2-4, and have each team come up with a linear, real-world scenario, then have them write down a linear function that fits this scenario on a sticky note. Make sure no one else can see the function!

  2. On the board or some flip-chart paper, have each team draw a scatter plot for which their linear function is best fit. They should only draw the point cloud - not the function itself! Finally, students title display to describe their real-world scenario (e.g. - "total cost vs. number of tickets purchased").

  3. Have teams switch places or rotate, so that each team is in front of another team’s scatter plot. Have them figure out the original function, write their best guess on a sticky note, and stick it next to the plot.

  4. Have teams return to their original scatter plot, and look at the model their colleagues guessed. How close were they? What strategies did the class use to figure out the model?

    • The slope and y-intercepts can be constrained to make the activity easier or harder. For example, limiting these coefficients to whole numbers, positive numbers, etc.

    • To extend the activity, have the teams continue rotating so that each group adds their sticky note for the best-guess model. Then do a gallery walk so that students can reflect: were the models all pretty close? All over the place? Were the guesses for one coefficient grouped more tightly than the guesses for another?

Synthesize

  • When does it make sense to make an lr-plot?

    • When we’ve identified that the form of the data is linear

  • How could we use scatter plots and linear models to find out if taller NBA players tend to make more three-pointers?

  • How could we use scatter plots and linear models to find out if wealthier people live longer?

  • How could we use scatter plots and linear models to find answers to other questions?

🔗(Optional) Other Forms of Linear Models 45 minutes

Overview

Students are reminded of the three forms of linear models available to us, discuss when and why we might choose one form over another, and practice translating between them.

Launch

When trying to fit a piece into a puzzle, sometimes we rotate the piece to see it from a different angle. When fitting a model to a dataset, we might prefer to look at the linear relationship from different angles as well!

So far, we’ve focused on models using the Slope-Intercept form of the line. That’s because it’s the form that is defined in terms of the response variable, making it most compatible with the programming environment.

But depending on the information we have available to us - or who we’re writing this model for - we might want to use other forms of linear models. Fortunately, we can always translate any model into another!

You may already be familiar with the different forms of linear models available to us:

Slope-Intercept Point-Slope Standard

𝑦 = π‘šπ‘₯ + 𝑏

𝑦 - 𝑦1 = π‘š(π‘₯ - π‘₯1)

𝐴π‘₯ + 𝐡𝑦 = 𝐢

  • m: slope

  • b: y-intercept

  • m: slope

  • 𝑦1: y-coordinate of a point

  • π‘₯1: x-coordinate of the same point

  • x-int: 𝐢 /𝐴

  • y-int: 𝐢 /𝐡

  • slope: - 𝐴 /𝐡

(1) Slope-Intercept Form makes it really easy to read the slope and y-intercept.

(2) Point-Slope Form makes it easy to find the equation of the line given a single point and slope.

(3) Standard Form makes it easy to find the x- and y-intercepts of the line.

Pose the questions below to assess student understanding of when and why we might choose one form over another.

Why we might choose to use one form over another?

  • Suppose our scatter plot has a state with 0% college enrollment, and another with 0% median income. Which linear model form would be easiest to build?

    • Standard Form

  • Suppose we only know the slope of a model, but we know the college graduation rate and median income for Rhode Island. Which form would make it easy to figure out the rest of the model?

    • Point-Slope Form

  • Which form makes it easiest to define our model in Pyret?

    • Slope-Intercept Form

Investigate

While it’s easier to write one linear form or the other based on the information available to us, and might be easier for someone else to extract the information they’re looking for based on the model we supply them with, we can easily translate back and forth between linear forms!

Synthesize

If you needed to draw the graph of a linear model, which form would you like to start from? Why?

🔗Additional Exercises

To practice reading linear models and connecting them to graphs:

For practice translating the models we’ve written today into other forms:

These materials were developed partly through support of the National Science Foundation, (awards 1042210, 1535276, 1648684, 1738598, 2031479, and 1501927). CCbadge Bootstrap by the Bootstrap Community is licensed under a Creative Commons 4.0 Unported License. This license does not grant permission to run training or professional development. Offering training or professional development with materials substantially derived from Bootstrap must be approved in writing by a Bootstrap Director. Permissions beyond the scope of this license, such as to run training, may be available by contacting contact@BootstrapWorld.org.