Students continue practicing the Design Recipe, and learn how to build and transform columns in a table. They also learn how to chain methods together, and define more sophisticated subsets. Finally, they consider the concept of trust and testing - how do we know if a particular analysis is trustworthy?
Students define functions that sort, filter, or extend the animals table
Standards and Evidence Statements:
Standards with prefix BS are specific to Bootstrap; others are from the Common Core. Mouse over each standard to see its corresponding evidence statements. Our Standards Document shows which units cover each standard.
Data 3.1.1: Use computers to process information, find patterns, and test hypotheses about digitally processed information to gain insight and knowledge. [P4]
BS-DR.1: The student is able to translate a word problem into a Contract and Purpose Statement
BS-DR.2: The student can derive test cases for a given contract and purpose statement
BS-DR.4: The student can solve word problems that involve data structures
BS-PL.3: The student is able to use the syntax of the programming language to define values and functions
ReviewTake a minute to look back at the opening questions you saw at the beginning of the class, and choose another one that interests you.
Using what you know now, what information would you need to collect in order to answer it? What subsets would you need to create? What analysis would you need to perform?
Debrief as a class.
What kinds of displays and charts have you learned about so far?
What does each kind of display tell us about a dataset?
When would you use each kind of display?
Spend some time on this - let students discuss amongst themselves, and facilitate as necessary.
Chaining Methods
Overview
Learning Objectives
Students learn the syntax for chaining methods together
Evidence Statementes
Product Outcomes
Materials
Preparation
Chaining Methods(Time 30 minutes)
Chaining MethodsTable methods can be chained together, so that we can build, filter and order a Table. For example:
This code takes the animals-table, and builds a new column. According to our Contracts Page, .build-column produces a new Table, and that’s the Table whose .filter method we use. That method produces yet another Table, and we call that Table’s order-by method. The Table that comes back from that is our final result.
Suggestion: use different color markers to draw nested boxes around each part of the expression, showing where each Table came from.
It can be difficult to read code that has lots of method calls chained together, so we can add a line-break before each "." to make it more readable. Here’s the exact same code, written with each method on its own line:
Order matters: Build, Filter, Order.
Suppose we want to build a column and then use it to filter our table. If we use the methods in the wrong order (trying to filter by a column that doesn’t exist yet), we might wind up crashing the program. Even worse, the program might work, but produce results that are incorrect!
When chaining methods, it’s important to build first, then filter, and then order.
How well do you know your table methods? Complete Page 36 and Page 37 in your Student Workbook to find out.
Have students discuss their answers.
Confirming Analysis
Overview
Learning Objectives
Students learn how to define functions using Table Plans
Evidence Statementes
Product Outcomes
Students define functions that sort, filter, or extend the animals table
Materials
Preparation
Confirming Analysis(Time 20 minutes)
Confirming AnalysisData Analysis is often used to make predictions based on some sample data. For example, we might look at the Animals Dataset and try to make predictions about other animal shelters based on that sample. But if the sample dataset doesn’t represent the full population, those predictions can be wrong - and sometimes, really really wrong!
Uber and Google are making self-driving cars, which use artificial intelligence to interpret sensor data and make predictions about whether a car should speed up, slow down, or slam on the brakes. This AI is trained on a lot of sample data, which it learns from. What might be the problem if the sample data only included roads in California?
Law enforcement in many towns has started using facial-recognition software to automatically detect whether someone has a warrant out for their arrest. A lot of facial-recognition software, however, has been trained on sample data containing mostly white faces. As a result, it has gotten really good at telling white people apart, but often can’t tell the difference between people who aren’t white. Why might this be a problem?
Why might it be a bad thing to only test medicines only on men (or only on women), before prescribing them to the general public?
Sample Data Matters!
A good Sample Table should be representative of the population, and relevant to what’s being analyzed.
At least the columns that matter - whether we’ll be ordering or filtering by those columns.
A good Sample Table has enough rows to be a representative sample of the dataset. If our dataset has a mix of dogs and cats, for example, we want at least one of each in this table.
A good Sample Table has rows in mostly random order, so that we’ll notice if our analysis winds up sorting them.
Sample Tables can also be used to verify that a certain analysis is correct. For example: suppose you’ve been given a function that is supposed to filter a table and show only the cats. If you test it on a Sample Table that only has cats to begin with, will that tell you whether or not the function works?
You’ll need a table with cats and non-cats.
Suppose you have a function that takes in a table of animals and shows only the kittens. What would your Sample Table need to have in order to verify this function?
You’ll need a table with cats and non-cats, as well as cats under the age of 2.
Suppose you have a function that takes in a table of animals and shows only the kittens, sorted in ascending order by weight. What would your Sample Table need to have in order to verify this function?
You’ll need a table with cats and non-cats, as well as cats under the age of 2, with the rows ordered randomly.
Turn to Page 38 in your student workbook. On each page, you’ve been given a function called fixed-cats and a description of what it claims to do.
List the names of the animals that you would use in a Sample Table to verify whether the function works as-advertised. When you’ve finished, open the Trust-but-Verify Starter File. There are three versions of fixed-cats here. Are they all correct? If not, which ones are broken?
Debrief with the class.
Turn to Page 39. Using the same Starter File, construct a Sample Table and figure out which (if any) of the functions are correct!
Debrief with the class.
Closing
Overview
Learning Objectives
Evidence Statementes
Product Outcomes
Materials
Preparation
Closing(Time 5 minutes)
ClosingAs our analysis gets more complex, method chaining is a great way to keep the code simple. But complex analysis also has more room for mistakes, so it’s critical to think about a Sample Table that allows us to trust that our code really does what it’s supposed to!
Bootstrap:Data Science by Emmanuel Schanzer, Nancy Pfenning, Emma Youndtsmith, Jennifer Poole, Shriram Krishnamurthi, Joe Politz and Ben Lerner was developed partly through support of the National Science Foundation, (awards 1535276, 1647486, and 1738598), and is licensed under a Creative Commons 4.0 Unported License. Based on a work at www.BootstrapWorld.org. Permissions beyond the scope of this license may be available by contacting schanzer@BootstrapWorld.org.