Lessons Used in This Pathway

Intro to CODAP & Displaying Categorical Data

(Also available in CODAP)

Students learn to generate and compare bar graphs, explore other plotting & display functions in CODAP, and (optionally) design an infographic.

Lesson Goals

Students will be able to:

Read bar charts
Generate bar charts (among other data displays) from the Animals Dataset

Student-facing Lesson Goals

Let’s get to know CODAP by creating bar graphs and other data displays from tables.

Materials

Preparation

Make sure all materials have been gathered
Decide how students will be grouped in pairs
Computer for each student (or pair), with access to the internet
Students should have Student workbook and something to write with.

Supplemental Resources

Information is Beautiful

Welcome to CODAP! 10 minutes

Overview

Students use the options on CODAP’s Configuration menu to produce displays and plots of the Animals Dataset.

Launch

Where have you seen infographics and graphs used to display data in the real world?

Open the Animals Dataset in CODAP.

Click the graph icon from the horizontal toolbar in the upper left. (See toolbar, below.) What appears?

🖼Show image

Select a dot with your mouse. What happens?
Try generating the graph again. Does it look the same or different? Why might this be relevant?
What happens when you select a table row? How about multiple table rows?
What happens when you click the "eye" icon (to the right of the graph or the table, depending on which is selected)?

🖼Show image

If students report that a blank graph appears (rather than a scatterplot), prompt them to whitelist CODAP on their ad-blocker. Ad-blockers do seem to inhibit some of the functionality of CODAP (which will fortunately never advertise to users!).

Initially, the data points are randomly distributed on the graph. Selecting an orange dot reveals the name of that particular animal. Selecting a particular dot causes the table row for that animal to be highlighted in blue. Holding the shift button allows students to select multiple dots in the graphical display, or multiple rows in the table.

Students should observe that when they select a table row (or multiple table rows), the corresponding dots change from orange to turquoise. When they set aside selected and / or unselected cases (by using the "eye" icon), they can temporarily alter the amount of pets in the dataset (with the option to restore to the original dataset).

Students can also resize the window by dragging its borders.

Investigate

Once we have a graph of randomly distributed data points, we can organize the data by selecting attributes from our table that we want to appear on the axes of our graph.

Experiment with creating some bar charts in CODAP by following the steps below.

Select the y-axis on your graph (where it says "Click here"). On the drop-down menu that appears, select fixed. What do you notice?
Now select the x-axis on your graph and select fixed. How does the graph change?
Select the configuration icon (which looks like a bar graph) to the right of the data display. Select fuse dots into bars.
Click the ruler icon; test out each of the two options available (count and percentage). What happens?
Now, make a bar chart showing how many animals there are of each species by changing the variable on the x-axis to species.
Experiment with bar charts, either by clicking on the axis title to display a menu of attributes or dragging column title from the table to the axes.
Which types of attributes can be displayed with the dots fused into bars? For which types of attributes does CODAP instead offer to create a bar for each point?

To dig deeper into bar charts, have students turn to Bar Chart - Notice and Wonder.

People aren’t Hermaphrodite? When students make a display of the sex of the animals, they will see that some animals are male, some are female and some are hermaphrodites. We use the descriptor sex rather than gender because sex refers to biology, whereas gender refers to identity. Hermaphrodite is the biological term for animals that carry eggs & produce sperm (nearly 1/3 of the non-insect animal species on the planet!). Plants that produce pollen & ovules are also hermaphrodites. While the term was previously used by the medical community to describe intersex people or people who identify as transgender or gender non-binary, it is not biologically accurate. Humans are not able to produce both viable eggs and sperm, so "hermaphrodite" is no longer considered an acceptable term to apply to people.

Common Misconceptions

Bar charts look a lot like another kind of chart - called a "histogram" - which are actually quite different because they display quantitative data, not categorical. Making a histogram in CODAP, however, begins quite similarly to making a bar chart in CODAP - by creating a dot plot that will be modified.

Synthesize

Bar charts display how much of the sample belongs to each category. If they are based on sample data from a larger population, we use them to infer the proportion of a whole population that might belong to each category.

Bar charts are mostly used to display categorical columns.

While bars in some bar charts should follow some logical order (alphabetical, small-medium-large, etc), they can technically be placed in any order, without changing the meaning of the chart.

Mini Project: Making Infographics Infographics are a powerful tool for communicating information, especially when made by people who actually understand how to connect visuals to data in meaningful ways. Making Infographics is an opportunity for students to become more flexible math thinkers while tapping into their creativity. This project can be made on the computer or with pencil and paper. There’s also an Infographics Rubric to highlight for you and your students what an excellent infographic includes.

Exploring other Displays 30 minutes

Overview

Students freely explore the CODAP data display options available to them when they select the bar graph icon (also known as the Configuration menu). In doing so, they experiment with new charts and get comfortable with CODAP as a platform for doing data science.

Launch

There are lots of different kinds of charts and plots. Even if you don’t know what these plots are for yet, see if you can figure out how to use them.

Open a scatterplot of randomly positioned points derived from the Animals Dataset by clicking on the graph icon again.
Now, drag or select the Weight attribute/column to the X-axis.
Select the little icon that looks like a ruler; it is the Measure icon. Try producing a box plot by selecting the appropriate option from this menu.
With Weight still on the X-axis, drag or select Time to Adoption to the Y-axis.
Take another look at the Measure menu, now that your data display is two-dimensional. Which options do you see that did not appear before?
Try producing a least squares line (also known as a regression line) by using the Measure menu.

Investigate

Complete Exploring Displays and (More) Exploring Displays.

Common Misconceptions

There are many possible misconceptions about displays that students may encounter here. But that’s ok! Understanding all those other plots is not a learning goal for this lesson. Rather, the goal is to have them develop some loose familiarity.

Synthesize

Today you’ve added more data displays to your toolbox. You can create bar charts to visually display data, and even transform entire tables!

You will have many opportunities to use these concepts in this course, by applying what you’ve learned to answer data science questions.

Additional Exercises:

Practice Plotting

These materials were developed partly through support of the National Science Foundation, (awards 1042210, 1535276, 1648684, and 1738598). Bootstrap by the Bootstrap Community is licensed under a Creative Commons 4.0 Unported License. This license does not grant permission to run training or professional development. Offering training or professional development with materials substantially derived from Bootstrap must be approved in writing by a Bootstrap Director. Permissions beyond the scope of this license, such as to run training, may be available by contacting contact@BootstrapWorld.org.

Transformers

(Also available in CODAP)

Students learn about Transformers, which allow them to order, filter, and build columns to extend the animals table.

Lesson Goals

Students will be able to…

order the Animals Dataset by a number of criteria
filter the Animals Dataset by species, fixed status, and age
add a column to the Animals Dataset

Student-facing Lesson Goals

Let’s learn how transform one table into another.

Materials

Preparation

Make sure all materials have been gathered
Decide how students will be grouped in pairs
Computer for each student (or pair), with access to the internet
Students should have Student workbook and something to write with.

Sorting Tables, Part 1 15 minutes

Overview

Students sort a column on a table by actually changing the table.

Launch

Open the Animals Dataset in CODAP. How many animals' names start with the letter M? What would make it easier to figure out how many animals' names start with the letter M?

Investigate

Student work in groups or pairs. They will sort various columns on the table by clicking the attribute name, and selecting from the drop-down menu either Sort Ascending or Sort Descending.

Let’s sort the animals alphabetically by name and see if we get the same answer as we did before!
Sort the animals by age, from youngest to oldest.
Sort the animals from heaviest to lightest.
Sort the animals table by how long it took for each animal to be adopted, in ascending order.

Synthesize

Can you think of a situation when it would be useful to sort animals from heaviest to lightest?
Does sorting animals-table produce a new table, or change the existing one? How do you know?
Can you think of a situation where CODAP’s default behavior - to alter the table each time we sort it - might inhibit more sophisticated data analysis?

Sorting Tables, Part 2 10 minutes

Overview

Using the Sort Transformer, students sort rows of a table in ascending or descending order, according to one column. When they use transformers, students are building and modifying a copy of the original table.

Launch

🖼Show image

The Transformers plugin allows students to transform datasets to produce new, distinct datasets, rather than modifying the original input dataset itself.

First, show students how they can access the Transformers plugin (see screenshot on right), and invite them to explore the different Transformers available.

Explain to students that the Sort Transformer consumes: (1) a dataset; (2) a formula; (3) the type that the formula evaluates to; and (4) a sort direction (ascending or descending). But what does it produce?

What’s a Contract?! Contracts help us keep track of the different Transformers we’ll be using, and how they operate on each Row of a table. Every contract has three imporant parts:

The Transformer’s Name
The Domain of the Transformer Expression - The type(s) of data we give it. For our purposes, this will always be a single Row of the table.
The Range of the Transformer Expression - The type of data the transformer produces from that Row

Check out the screenshot of the Transformers plugin below. What are the domain and range for Filter?

🖼Show image

The Transformer mean doesn’t display a contract. What type of data do you think mean must consume? Why?

Investigate

Open the Transformer plugin, and choose the transformer Sort. Select animals-dataset. In the formula expression box, type Name. Select ascending as the direction and Apply Tranformer. What happens?
Next, see what happens when you select descending.
Sort the animals table from heaviest to lightest.
Sort the animals table alphabetically by species.
Sort the animals table by how long it took for each animal to be adopted, in ascending order.

Common Misconceptions

Students may be more familiar with filters that actually change the table. In CODAP, all transformers produce a brand new table. Filtered tables are automatically saved; CODAP titles each new table with a number in curly braces at the end (for example, Filter(Animals-Dataset) {1} ) to indicate how many times the transformer has been applied. When students apply a transformer, they have the option of selecting the original table from the dropdown menu, or a new table that they’ve generated. Students can also rename saved tables, if they’d like.

Synthesize

Does the transformer Sort produce a new table, or change the existing one?
You’ve now learned two different strategies for sorting a column of a table. What do the two strategies have in common? How are they different?

Filtering Tables 20 minutes

Overview

Students learn how to filter tables by removing Rows.

Launch

Explain to students that you have "Function Cards", which describe the purpose statement of a function that consumes a Row from a table of students, and produces a Boolean (e.g. - "this student is wearing glasses"). Select a volunteer to be the "Filter Transformer", and have them randomly choose a function card from from the Function Cards set, and make sure they read it without showing it to anyone else.

Have 6-8 students line up in front of the classroom, and have the filter transformer go to each student and say "stay" or "sit" depending on whether their function would return true or false for that student. If they say "sit", the student sits down. If they say "stay", the student stays standing.

Ask the class: based on who sat and who stayed, what function was on the card?

The Filter Transformer takes a dataset and produces a copy of it that contains only the cases for which the given formula evaluates to true.

Suppose we want to get a table of only animals that have been fixed? The Filter transformer consumes a dataset to filter and a formula expression that evaluates to either true or false.

Investigate

Open the Transformer plugin, and choose the transformer Filter. Select animals-dataset. In the formula expression box, type Fixed = “TRUE”. Apply the transformer. What happens?
Does CODAP mind if you forget to capitalize? What about if you leave out quotation marks? Examine the error messages that appear if you are just a little careless as you enter text into the formula expression box.
This time, in the formula expression box, type Age > 5. What did you get?
Now try Species = “dog”

The Filter Transformer walks through the table, applying whatever formula it was given to each row, and produces a new table containing all the rows for which the formula returned true. Notice that Filter takes a dataset and produces a copy of it that contains only the cases for which the given formula evaluates to true. If it consumes anything besides a single Row, or if it produces anything else besides a Boolean, we’ll get an error.

Synthesize

Debrief with students. Some guiding questions on filtering:

Suppose we wanted to determine whether cats or dogs get adopted faster. How might using the Filter transformer help?
If the shelter is purchasing food for older cats, what filter would we write to determine how many cats to buy for?
Can you think of a situation where filtering fixed animals would be helpful?

Building Columns 10 minutes

Overview

Students learn how to build columns, using the Build Attribute transformer.

Launch

Suppose we want to transform our table, converting pounds to kilograms or weeks to days. The Build Attribute transformer makes a new copy of a dataset, and adds a new attribute. We must provide a dataset, a name for the new attribute, an existing collection to add the attribute to, a formula for the attribute’s values, and an indication of the type of value the formula will evaluate to.

Investigate

Open the Transformer plugin, and choose the transformer Build Attribute. Select animals-dataset.
Enter Young as the Name of New Attribute. Select cases as the Collection to Add To. In the formula expression box, type Age < 5. Apply the transformer. What happens?
Now, enter is-cat as the Name of New Attribute and try typing Species = “cat” in the formula expression box. What do you get? What do you think is going on?

The Build Attribute Transformer walks through the table, applying whatever formula expression it was given to each row. Whatever the formula expression produces for that row becomes the value of our new column, which is named based on the string it was given. In the first example, we gave it Age < 5, so the new table had an extra Boolean column for every animal, indicating whether or not it was young.

Synthesize

Debrief with students. Ask them if they can think of a situation where they would want to use this. Some ideas:

A dataset from Europe might list everything in metric (centimeters, kilograms, etc), so we could build a column to convert that to imperial units (inches, pounds, etc).
A dataset about schools might include columns for how many students are in the school and how many of those students identify as multi-racial. But when comparing schools of different sizes, what we really want is a column showing what percentage of students identify as multi-racial. We could use to compute that for every row in the table.

Being able to define is a huge upgrade in our ability to analyze data! But as a wise person once said, "with great power comes great responsibility"! Dropping all the dogs from our dataset might be a cute exercise in this class, but suppose we want to drop certain populations from a national census? Even a small programming error could erase millions of people, impact funding for things like roads and schools, etc.

Additional Exercises:

What Table Do We Get?

These materials were developed partly through support of the National Science Foundation, (awards 1042210, 1535276, 1648684, and 1738598). Bootstrap by the Bootstrap Community is licensed under a Creative Commons 4.0 Unported License. This license does not grant permission to run training or professional development. Offering training or professional development with materials substantially derived from Bootstrap must be approved in writing by a Bootstrap Director. Permissions beyond the scope of this license, such as to run training, may be available by contacting contact@BootstrapWorld.org.

Method Chaining

Lessons
Prerequisites for this lesson ★

Standards in this Lesson

CSTA Standards

1B-AP-10: Create programs that include sequences, events, loops, and conditionals.
2-AP-13: Decompose problems and subproblems into parts to facilitate the design, implementation, and review of programs
2-AP-17: Systematically test and refine programs using a range of test cases
3A-AP-17: Decompose problems into smaller components through systematic analysis, using constructs such as procedures, modules, and/or objects.
3A-AP-18: Create artifacts by using procedures within a program, combinations of data and procedures, or independent but interrelated programs.

K-12CS Standards

6-8.Algorithms and Programming.Control: Programmers select and combine control structures, such as loops, event handlers, and conditionals, to create more complex program behavior.
9-12.Algorithms and Programming.Control: Programmers consider tradeoffs related to implementation, readability, and program performance when selecting and combining control structures.
9-12.Algorithms and Programming.Modularity: Complex programs are designed as systems of interacting modules, each with a specific role, coordinating for a common overall purpose. These modules can be procedures within a program; combinations of data and procedures; or independent, but interrelated, programs. Modules allow for better management of complex tasks.

Oklahoma Standards

OK.7.AP.M.01: Decompose problems into parts to facilitate the design, implementation, and review of increasingly complex programs.
OK.8.AP.M.01: Decompose problems and subproblems into parts to facilitate the design, implementation, and review of complex programs.
OK.A1.D.2.1: Select and apply counting procedures, such as the multiplication and addition principles and tree diagrams, to determine the size of a sample space (the number of possible outcomes) and to calculate probabilities.
OK.L1.AP.M.01: Break down a solution into procedures using systematic analysis and design.
OK.L1.AP.M.02: Create computational artifacts by systematically organizing, manipulating and/or processing data.

Textbook Alignment

IM Algebra 1

IM.Alg1.4.3: Interpreting & Using Function Notation

Practices in this Lesson

P3: Recognizing and Defining Computational Problems
MLR.8: Discussion Supports
MP.6: Attend to precision

(Also available in CODAP)

Students learn how to chain Methods together, and define more sophisticated subsets.

Lesson Goals

Students will be able to…

Use method chaining to write more sophisticated analyses using less code
Identify bugs introduced by chaining methods in the wrong order

Student-facing Lesson Goals

Let’s practice writing functions and combining methods.

Materials

Preparation

Make sure all materials have been gathered
Decide how students will be grouped in pairs
Computer for each student (or pair), with access to the internet. All students should log into CPO and open the "Animals Starter File" they saved from the prior lesson. If they don’t have the file, they can open a new one
Students should have Student workbook and something to write with.

Design Recipe Practice 25 minutes

Overview

Students practice more of what they learned in the previous lesson, applying the Design Recipe to make table functions that operate on rows of the Animals Dataset. These become the basis of the chaining activity that follows.

Launch

The Design Recipe is a powerful tool for solving problems by writing functions . It’s important for this to be like second nature, so let’s get some more practice using it!

Investigate

Define the Compute functions on The Design Recipe: is-dog / is-female and The Design Recipe: is-old / name-has-s.

Optional: Combining Booleans

Suppose we want to build a table of Animals that are fixed and old, or a table of animals that are cats or dogs?

By using the and and or operators, we can combine boolean tests , as in: (1 > 2) and ("a" == "b") . This is handy for more complex programs! For example, we might want to ask if a character in a video game has run out of health points and if they have any more lives. We might want to know if someone’s ZIP Code puts them in Texas or New Mexico. When you go out to eat at a restaurant, you might ask what items on the menu have meat and cheese.

For many of the situations where you might use and, there’s actually a much more powerful mechanism you can use, called "Method Chaining"!

Synthesize

Did students find themselves getting faster at using the Design Recipe? Can students share any patterns they noticed, or shortcuts they used?

Chaining 25 minutes

Overview

Students learn how to compose multiple table operations (sorting, filtering, building) on the same table - a technique called "chaining".

Launch

Now that we are doing more sophisticated analyses, we might find ourselves writing the following code:

# get a table with the nametags of all the fixed animals, ordered by species
with-labels = animals-table.build-column("labels", nametag)
fixed-with-labels = with-nametags.filter(is-fixed)
result = fixed-with-labels.order-by("species", true)

That’s a lot of code, and it also requires us to come up with names for each intermediate step! Pyret allows table methods to be chained together, so that we can build, filter and order a Table in one shot. For example:

# get a table with the nametags of all the fixed animals, ordered by species
result = animals-table.build-column("labels", nametag).filter(is-fixed).order-by("species", true)

This code takes the animals-table, and builds a new column. According to our Contracts Page, .build-column produces a new Table, and that’s the Table whose .filter method we use. That method produces yet another Table, and we call that Table’s order-by method. The Table that comes back from that is our final result.

Teaching Tip

Use different color markers to draw nested boxes around each part of the expression, showing where each Table came from.

It can be difficult to read code that has lots of method calls chained together, so we can add a line-break before each “.” to make it more readable. Here’s the exact same code, written with each method on its own line:

# get a table with the nametags of all the fixed animals, order by species
animals-table
  .build-column("label", nametag)
  .filter(is-fixed)
  .order-by("species", true)

Order matters: Build, Filter, Sort.

Suppose we want to build a column and then use it to filter our table. If we use the methods in the wrong order (trying to filter by a column that doesn’t exist yet), we might wind up crashing the program. Even worse, the program might work, but produce results that are incorrect!

Investigate

When chaining methods , it’s important to build first, then filter, and then order.

How well do you know your table methods? Complete Chaining Methods and Chaining Methods 2: Order Matters in your Student Workbook to find out.

Synthesize

As our analysis gets more complex, chaining is a great way to re-use work we’ve already done. And less duplicate work means a smaller chance of bugs. Chaining is a powerful way to work, so it’s critical to think carefully when we use it!

Additional Exercises

Table Transformations with Method Chaining

These materials were developed partly through support of the National Science Foundation, (awards 1042210, 1535276, 1648684, and 1738598). Bootstrap by the Bootstrap Community is licensed under a Creative Commons 4.0 Unported License. This license does not grant permission to run training or professional development. Offering training or professional development with materials substantially derived from Bootstrap must be approved in writing by a Bootstrap Director. Permissions beyond the scope of this license, such as to run training, may be available by contacting contact@BootstrapWorld.org.

Randomness and Sample Size

Lessons
Prerequisites for this lesson ★

Standards in this Lesson

Common Core Math Standards

HSS.IC.B.3: Recognize the purposes of and differences among sample surveys, experiments, and observational studies; explain how randomization relates to each.

CSTA Standards

2-DA-08: Collect data using computational tools and transform the data to make it more useful and reliable.
2-DA-09: Refine computational models based on the data they have generated.

Oklahoma Standards

OK.L1.IC.C.02: Test and refine computational artifacts to reduce bias and equity deficits.
OK.PA.A.2.2: Identify, describe, and analyze linear relationships between two variables.
OK.PA.D.2.2: Determine how samples are chosen (random, limited, biased) to draw and support conclusions about generalizing a sample to a population.

Textbook Alignment

IM 7 Math™

IM.7.8.17: More about Sampling Variability
IM.7.8.14: Sampling in a Fair Way
IM.7.8.12: Larger Populations

Connected Math

CMP.7.8: Samples and Populations: Making Comparisons and Predictions

Practices in this Lesson

SJ.10: Students will examine diversity in social, cultural, political and historical contexts rather than in ways that are superficial or oversimplified.
SEP.3: Planning and Carrying Out Investigations
MP.2: Reason abstractly and quantitatively
MLR.1: Stronger and Clearer Each Time

(Also available in CODAP)

Students learn about random samples and statistical inference, as applied to the Animals Dataset. In the process, students get a light introduction to the role of sample size and the importance of statistical inference.

Lesson Goals

Students will be able to…

Take random samples from a population
Understand the need for random samples
Understand the role of sample size

Student-facing Lesson Goals

Let’s explore how random sampling can be used with datasets.

Materials

Preparation

Make sure all materials have been gathered.
Decide how students will be grouped in pairs.
Computer for each student (or pair), with access to the internet
Students should have Student workbook and something to write with.

Glossary

statistical inference: using information from a sample to draw conclusions about the larger population from which the sample was taken

Flip the Script: Inference v. Probability 45 minutes

Overview

Statistical inference involves looking at a sample and trying to infer something you don’t know about a larger population. This requires a sort of backwards reasoning, kind of like making a guess about a cause, based on the effect that we see. To better understand the process of going from the sample back to the population, it helps to understand the more straightforward process of going from the population to a sample. If the sample is random, we call this process Probability!

In real life we typically don’t know what’s true for an entire population. But this probability thought-experiment will start with a larger population with known properties (such as the fact that nearly half of the entire population are males). Then we’ll see what kind of behavior we tend to see in random samples taken from that population.

Launch

Inference Reasons Backwards; Probability Reasons Forwards

One of the most useful tasks in Data Science is using sample data to infer (guess) what’s true about the larger population from which the sample was taken. This process, called statistical inference, is used to gain information in practically every field of study you can imagine: medicine, business, politics, history; even art! Early on, statisticians discovered that random samples almost always work best.

Suppose we want to estimate what percentage of all Americans plan to vote for a certain candidate. We can’t ask everyone who they’re voting for, so pollsters instead take a sample of Americans, and generalize the opinion of the sample to estimate how Americans as a whole feel. But choosing a sample can be tricky…

Would it be problematic to only call voters who are registered Democrats? To only call voters under 25? To only call regular churchgoers? Why or why not?
How could we choose a representative subset, or sample of American voters?
Would it be problematic to only sample a handful of voters? What do we gain by taking a larger sample?

Before we infer something unknown about a population from a sample, we need to know what makes a "good" sample!

Sampling is a complicated issue. The main reason for doing inference is to guess about something that’s unknown for the whole population. But a useful step along the way is to practice with situations where we happen to know what’s true for the whole population. As an exercise, we can keep taking random samples from that population and see how close they tend to get us to the truth. Another discovery (besides the value of randomness) that statisticians made early on was something that’s perfectly consistent with common sense: Larger samples are better than smaller ones, because they tend to get us closer to the truth about the whole population.

Let’s see what happens if we switch from smaller to larger sample sizes, if we’re taking a random sample of shelter animals to infer what’s true about the larger population…

Students should log into CPO open the Expanded Animals Starter File (Pyret), and save a copy.

Investigate

The Animals Dataset we’ve been using is just one sample taken from a very large animal shelter. How much can we infer about the whole population of hundreds of animals, by looking at just this one sample?

Divide the class into groups of 3-5 students.
Have students open the Expanded Animals Starter File (Pyret), and click "Run".
Have students complete Sampling and Inference, sharing their results and discussing with the group.
For a deeper exploration of the impact of sample size, have students complete Predictions from Samples

Common Misconceptions

Many people mistakenly believe that larger populations need to be represented by larger samples. In fact, the formulas that Data Scientists use to assess how good a job the sample does is only based on the sample size, not the population size.

Extension

In a statistics-focused class, or if appropriate for your learning goals, this is a great place to include more rigorous statistics content on sample size, sampling bias, etc.

Synthesize

Have students share. Were larger samples always better for guessing the truth about the whole population? If so, how much better?

Project Options: Food Habits / Time Use

In both of these projects, students gather data about their own lives and use what they’ve learned in the class so far to analyze it. This project can be used as a mid-term or formative assessment, or as a capstone for a limited implementation of Bootstrap:Data Science. See the project descriptions for Food Habits and Time Use.

(Based on the projects of the same name from IDS at UCLA)

These materials were developed partly through support of the National Science Foundation, (awards 1042210, 1535276, 1648684, and 1738598). Bootstrap by the Bootstrap Community is licensed under a Creative Commons 4.0 Unported License. This license does not grant permission to run training or professional development. Offering training or professional development with materials substantially derived from Bootstrap must be approved in writing by a Bootstrap Director. Permissions beyond the scope of this license, such as to run training, may be available by contacting contact@BootstrapWorld.org.

Grouped Samples

Lessons
Prerequisites for this lesson ★

Standards in this Lesson

Common Core Math Standards

6.EE.B.6: Use variables to represent numbers and write expressions when solving a real-world or mathematical problem; understand that a variable can represent an unknown number, or, depending on the purpose at hand, any number in a specified set.
8.SP.A.1: Construct and interpret scatter plots for bivariate measurement data to investigate patterns of association between two quantities. Describe patterns such as clustering, outliers, positive or negative association, linear association, and nonlinear association.

CSTA Standards

2-AP-11: Create clearly named variables that represent different data types and perform operations on their values.
2-DA-08: Collect data using computational tools and transform the data to make it more useful and reliable.
2-DA-09: Refine computational models based on the data they have generated.

Oklahoma Standards

OK.6.D.1.3: Create and analyze box and whisker plots observing how each segment contains one quarter of the data.
OK.7.D.1.2: Use reasoning with proportions to display and interpret data in circle graphs (pie charts) and histograms. Choose the appropriate data display and know how to create the display using a spreadsheet or other graphing technology.
OK.8.DA.CVT.01: Develop, implement, and refine a process that utilizes computational tools to collect and transform data to make it more useful and reliable.
OK.8.DA.S.01: Analyze multiple methods of representing data and choose the most appropriate method for representing data.
OK.A1.D.1.1: Describe a data set using data displays, describe and compare data sets using summary statistics, including measures of central tendency, location, and spread. Know how to use calculators, spreadsheets, or other appropriate technology to display data and calculate summary statistics.
OK.L1.AP.A.01: Create a prototype that uses algorithms (e.g., searching, sorting, finding shortest distance) to provide a possible solution for a real-world problem.
OK.L1.IC.C.02: Test and refine computational artifacts to reduce bias and equity deficits.
OK.PA.A.2.2: Identify, describe, and analyze linear relationships between two variables.
OK.PA.D.1.1: Describe the impact that inserting or deleting a data point has on the mean and the median of a data set. Know how to create data displays using a spreadsheet and use a calculator to examine this impact.

Textbook Alignment

IM 7 Math™

IM.7.8.18: Comparing Populations Using Samples
IM.7.8.11: Comparing Groups

Practices in this Lesson

P3: Recognizing and Defining Computational Problems
SEP.3: Planning and Carrying Out Investigations
MP.3: Construct viable arguments and critique the reasoning of others
MP.2: Reason abstractly and quantitatively

(Also available in CODAP)

Students practice creating subsets and think about why it might sometimes be useful to answer questions about a dataset through the lens of specific subsets.

Lesson Goals

Students will be able to…

Make grouped samples from a population

Student-facing Lesson Goals

Let’s combine what we know about sampling and filtering with creating displays.

Materials

Preparation

Make sure all materials have been gathered
Decide how students will be grouped in pairs
Computer for each student (or pair), with access to the internet
Students should have Student workbook and something to write with.
All students should log into CPO and open the "Animals Starter File" they saved from the prior lesson. If they don’t have the file, they can open a new one

Glossary

grouped sample: a non-random subset of individuals chosen from a larger set, where the individuals belong to a specific group

Problems with a Single Population 10 minutes

Overview

This activity is all about grouped samples: Students make a bunch of subsets from the Animals Dataset, and see how each subset might answer the same question differently.

Launch

When looking at a scatter plot of our animals, it looks like the amount an animal weighs may have something to do with how long it takes to be adopted. 🖼Show image

But if we label the dots by animal, we notice every data point after 25 pounds belongs to a dog from the shelter! The cats are all clumped together in the lower weight range, making it hard to see how weeks to adoption may relate to a cat’s weight.

🖼Show image

Investigate

Divide the class into groups of 3-4, with one student identified as the "reporter".

Looking at this scatterplot, does it make sense to analyze all the animals together? Why or why not?
Are there some questions where it would be important to break up the population into species-specific populations? What are they?
Are there some questions where it would be important to keep the whole population together? What are they?

Synthesize

Have the reporters share their findings with the class.

Imagine that you’ve been handed a dataset from a country where half the people are wealthy and have access to amazing medical care, and the other half are poor and have no healthcare. If we took a random sample of the population as a whole, we might think that they are generally middle-income and have average health. But if we ask the same question about the two groups separately, we would discover inequality hiding in plain sight!

Grouped Samples 20 minutes

Launch

Ultimately, it might make more sense to ask certain questions about "just the cats" or "just the dogs". Averaging every animal together will give us an answer, but they may not be useful answers.

Sometimes important facts about samples get lost if we mix them with the rest of the population!

Data Scientists define grouped samples of datasets, breaking them up into sub-groups that may be helpful in their analysis.

Earlier, you learned how to define values in Pyret. We can define Numbers, Strings, Images, and even rows:

name = "Flannery"
age  = 16
logo = star(50, "solid", "red")
sasha= animals-table.row-n(0)

Let’s use this skill to define Tables…

Investigate

We already know how to define values, and how to filter a dataset. So let’s put those skills together to define a grouped sample of the dogs in the shelter:

dogs  = animals-table.filter(is-dog)

A “kitten” is an animal who is a cat and who is young. How would you define a table of just kittens?

Turn to Grouped Samples from the Animals Dataset, and see what code will compute whether or not an animal is a kitten.
Can you fill in the code for the other grouped samples?
When you’re done, type these definitions into the Definitions Area.
Make a bar chart showing the distribution of sex in the kittens subset , by typing bar-chart(kittens, "sex").
Make bar charts showing the sex column for every grouped sample. Which one best represents the distribution of species for the whole population? Why?

Synthesize

Debrief with students. Thoughtful question: how could we filter and sort a table? How can we combine methods?

Displaying Samples 20 minutes

Overview

Students revisit the data display activity, now using the samples they created.

Launch

Making grouped and random samples is a powerful skill to have, which allows us to dig deeper than just making charts or asking questions about a whole dataset. Now that we know how to make subsets, we can make much more sophisticated displays!

Investigate

Complete Displaying Data, using what you’ve learned about samples to make more sophisticated data displays.

Synthesize

Were any of the students' displays interesting or surprising? Given a novel question, can students identify what helper functions they would need to write?

These materials were developed partly through support of the National Science Foundation, (awards 1042210, 1535276, 1648684, and 1738598). Bootstrap by the Bootstrap Community is licensed under a Creative Commons 4.0 Unported License. This license does not grant permission to run training or professional development. Offering training or professional development with materials substantially derived from Bootstrap must be approved in writing by a Bootstrap Director. Permissions beyond the scope of this license, such as to run training, may be available by contacting contact@BootstrapWorld.org.