Is there a relationship between the amount of sugar in a meal, and the number of calories? Do more caloric meals tend to have more sugar? To answer this question, students are introduced to Lists and Tables in Pyret, as well as scatter plots as a way of visualizing data
categorical data: kind of data with a value that has a fixed number of possible values
column: set of values in a table of a particular type. Every row has exactly 1 element in every column
dataset: a collection of related information that is composed of separate elements, but can be manipulated as a unit by a computer
entry: a single value in a table, belonging to a particular row and column
extract: to construct a list using data from a particular column in a table
header: The part of a table that identifies the names of each column
list: a data structure containing a sequence of values
quantitative data: data with values that measure some amount or quantity; may not have a fixed number of possible values
row: item(s) in a table, which consist of some group of values; each table is a collection of rows
table: A data structure that stores data as rows, with entries in particular columns
Materials:
Preparation:
Show students the opening questions, either as a handout or on posters set up around the room.
Types
Functions
Number
+, -, *, /, num-sqrt, num-sqr
String
n/a
Image
draw-plot
Series
function-plot
Introduction
Overview
Learning Objectives
Evidence Statements
Product Outcomes
Materials
Preparation
Show students the opening questions, either as a handout or on posters set up around the room.
Introduction
(Time 15 minutes)
A lot of restaurants these days publish nutrition information as part of their menu. When you look at these menus, you can find out how many calories each item contains. How much of a role does sugar play in that? Do sugary meals always have more calories than meals with less sugar? Do high-calorie meals always have a lot of sugar in them?
Turn to Page 4. Take two minutes to write down what you think, and why.
Give the class a minute for open discussion. The more they are engaged with the question, the more substantial their answers will be.
So far you have learned the fundamentals of writing programs that do work on simple data like Numbers (1, -6, 3.5, etc.) and Strings ("hello", "17", etc.). As data scientists, we need to write programs that work on complex data like restaurant menus, which can have many related parts and contain dozens or millions of entries. A collection of related data that can be grouped and manipulated by a computer is called a dataset. In this unit, you will learn how Pyret works with real-world data, and how to answer data science questions like this one by writing programs over that data.
Tablular Data
Overview
Learning Objectives
Evidence Statements
Product Outcomes
Materials
Preparation
Tablular Data
(Time 15 minutes)
First
Last
Eye-Color
Height
"John"
"Doe"
"Green"
52.0
"Jane"
"Smith"
"Brown"
49.1
"Javon"
"Jackson"
"Brown"
57.7
"Angela"
"Enriquez"
"Hazel"
52.5
"Jack"
"Thompson"
"Blue"
53.0
"Dominique"
"Rodriguez"
"Hazel"
51.1
"Sammy"
"Carter"
"Blue"
56.2
"Andrea"
"Garcia"
"Brown"
50.8
Show the kids slides with each image of the example table, focusing on the different aspects of tables. Express that each row represents a distinct object, and each entry in a row stores information about that object.
This is a table containing information about students in a fourth grade class. tables are collections of cells - or "entries", where each entry contains one value. In most of the tables you will encounter, these values will be of type Number or String.
Tables are organized into columns and rows.
How many columns does this table have?
All of the entries in a particular column will contain values that are the same type, and represent the same thing. For example, each entry in the 3rd column represents the eye color of a person in the class.
Eye-Color
"Green"
"Brown"
"Brown"
"Hazel"
"Blue"
"Hazel"
"Blue"
"Brown"
At the top of our table is the header. Each part of the header is the name of a particular column. The header is not a row!. A table with no data has no rows, even though it may have a header. For example:
First
Last
Eye-Color
Height
Turn to Page 5 in your workbook, and use your knowledge of tables to answer the questions there.
Now that you know the basic information about tables, it’s time to get some hands on experience with them in Pyret. Pyret allows us to write programs that work on tables, similar to how we write programs that work on Numbers and Strings.
In supplemental lessons, students/teachers can load their own tables into Google Sheets, as well as design surveys to populate Google Sheets with data. However, in the core curriculum this will not be covered; students will use scaffolded code to work with tables exclusively in Pyret.
Open up the Unit 2 Starter File. Make sure you are signed into your Google account, and click the "Save a Copy" button. This will save a copy of the file into your own account, so that you can make changes and retrieve them later.
As you learned in Unit 1, Pyret allows us to define names for values and expressions, so that we can refer to them later. Every definition you’ve seen involves an identifier, followed by the equals sign, and an expression to bind to that name. For example, we can define the identifier name to be the string "Tara" by writing
Some definitions are more complex than that. What identifiers do you see defined in this file?
The definitions area contains code to load two tables from Google Sheets; presidents and nutrition. As you saw at the beginning of the lesson, tables need headers to describe them! The load-table: block specifies the headers for the table being imported! Each of these load-table: blocks comes after an = operator which defines the variable names as these tables.
To evaluate a variable, we click Run and type its name into the Interactions Area. We do the same thing if that variable is a Number, a String, or even a Table! Click Run, and type each of these programs into the interactions window, then hit Enter/Return.
nutrition
presidents
The students should not need to know exactly how the table-loading code. However, for those that are curious:
The include statement allows Pyret to use a supplementary module which can talk to the Google Sheets API. This lets students use/apply functions that take data from Google Sheets.
The load-spreadsheet function applications are what will find particular spreadsheets and their content. The argument is a String that is a unique ID to a particular Google Sheet. We have hardcoded these IDs in the scaffolding so that they link to our presidents and nutrition tables.
The load-table command is what actually loads a table that we can use in the interactions window. In this expression, each of the column names are enumerated (this establishes what the header contains), and says that the source will be the sheets from the load-spreadsheet functions applications.
Quantitative and Categorical Data
Overview
Learning Objectives
Evidence Statements
Product Outcomes
Materials
Preparation
Quantitative and Categorical Data
(Time 15 minutes)
You have loaded your first tables into Pyret. These tables contain different data: the first is a table about the presidents of the US, and the second has nutritional information about items on a menu. Before we can dive into all of the cool things you can do with tables, we need to understand the two different kinds of data that can appear in tables: categorical and quantitative.
Let’s take another look at the first example table we saw in Unit 2.
First
Last
Eye-Color
Height
"John"
"Doe"
"Green"
52.0
"Jane"
"Smith"
"Brown"
49.1
"Javon"
"Jackson"
"Brown"
57.7
"Angela"
"Enriquez"
"Hazel"
52.5
"Jack"
"Thompson"
"Blue"
53.0
"Dominique"
"Rodriguez"
"Hazel"
51.1
"Sammy"
"Carter"
"Blue"
56.2
"Andrea"
"Garcia"
"Brown"
50.8
The first kind of data we will look at is Quantitative Data. Quantitative Data always measures an amount of something. If a question asks "how much" there is of something, the answer will be quantitative data.
Look at the Height column. If you ask the question "How tall is John Doe?" (in other words, how much height does John Doe have?), the answer is 52.0 inches. Quantitative Data usually has units of measurement; in this case the unit of measurement is inches.
Another important fact is that Quantitative Data can be larger or smaller than other Quantitative Data. For example, if we ask the question "Is John Doe taller than Andrea Garcia?" (in other words, does John Doe have more height than Andrea Garcia), it can be answered by comparing their entries in the height column. John Doe’s height is bigger than Andrea Garcia’s, so we can say yes, he is taller.
Now look at the Eye Color column.
First
Last
Eye-Color
Height
"John"
"Doe"
"Green"
52.0
"Jane"
"Smith"
"Brown"
49.1
"Javon"
"Jackson"
"Brown"
57.7
"Angela"
"Enriquez"
"Hazel"
52.5
"Jack"
"Thompson"
"Blue"
53.0
"Dominique"
"Rodriguez"
"Hazel"
51.1
"Sammy"
"Carter"
"Blue"
56.2
"Andrea"
"Garcia"
"Brown"
50.8
Can we ask the question "Does John Doe have more eye color than Andrea Garcia?" No. That question makes no sense! This is the second kind of data we will look at, called Categorical Data. We used Quantitative Data to quantify; to ask "how much" there is of something. We use Categorical Data to ask "which one"? In this case, students are put into the same category if they have the same eye color.
Let’s consider eye color. How many different eye colors are there?
Guide students towards this list: Amber, Blue, Green, Brown, Grey, Hazel.
So if there are only 6 different natural eye colors, then each value in the column has to be one of these categories. We would say that this column has 6 different possible categories.
Navigate back to your pyret program that loads the nutrition and presidents tables.
Then, turn to Page 6 in your workbook and answer the questions about these two data sets.
This workbook assignment could also become a homework assignment, or be made into a quiz/jeopardy style game.
Extracting Lists from Tables
Overview
Learning Objectives
Evidence Statements
Product Outcomes
Materials
Preparation
Extracting Lists from Tables
(Time 20 minutes)
Tables are 2-dimensional collections of data, but we often want to ask 1-dimensional questions of them. For example, if we ask "what is the lowest amount of sodium on the menu", or "What is biggest serving size possible", these are questions that only involve one column at a time. We need some way of looking at each column individually in our programs. In other words, we often want to get 1-dimensional data out of of a 2-dimensional table.
The extract operation does just that!
What is the name of the identifier being defined here? What do you think its value is?
After running this program, typing sodium-list into the interactions window and hitting Return gives us the following list: [list: 480, 680, 820, 360, 1300, 790, 160, 150, 680, 130].
Let’s examine this line of code, piece by piece.
extract tells Pyret that we want to take a particular column out of a table.
After the extract keyword, we choose the name of 1 column we want
to see as a list. In this case, it is the sodium column.
Then, the from keyword, which indicates which table we are extracting the column from. Following this is the table name, which is nutrition.
Finally, an end keyword tells Pyret that our line of code involving a table is done.
This is the first example of an expression that consumes a table that students will need to write for themselves. These expressions (extract, sieve, select, order, and extend) have fundamentally different syntax than programs they have seen with arithmetic or function application.
Lists are a new type, and we write the type of a list in terms of it’s contents. For example, a List of Numbers would be of type List<Number>. How do you think you would write the type for a List of Strings?
Define a list called state-list containing the home-state column from presidents.
Define a list called calories-list containing the calories column from nutrition.
Define a list called sugar-list that contains the sugar column from nutrition.
Notice that these Lists contain just one type of data: either only Strings, or only Numbers.
In what ways are Lists different from Tables? Tables are 2-dimensional, while Lists are 1-dimensional. Tables also have a header, which associates a name with each column. Lists, on the other hand, have no header.
However, List do share some qualities with tables. They have multiple entries, and those entries are in a specific order. They can also be filled with either quantitative or categorical data. In the next lesson, we’ll learn about functions that let us ask questions of lists, to help us look for ways to measure a set of data.
Scatterplots
Overview
Learning Objectives
Evidence Statements
Product Outcomes
Materials
Preparation
Scatterplots
(Time 15 minutes)
Do foods with more sugar tend to have more calories? Now that we have calories and sugar extracted as lists, we can plot these data points and find out! Once again, we’ll want to include the plot-list file, and all the functions defined within it. This time, instead of using function-plot we’ll use scatter-plot, which takes in two lists of numbers.
What do you think the contract is for scatter-plot? Copy it down into your Contracts page. Once we’ve created the series, it’s time to plot it.
Use the draw-plot function to draw this scatter plot.(Go back to the previous unit if you forget how to display your plot.) What do you notice about this plot? Is there a relationship between sugar and calories? Take two minutes and write your answer on Page 4 - does this support your hypothesis or not?
What other kinds of relationships can you find in these tables?
Later on in this class, you’ll learn how to plot many kinds of data, and how to search for trends and relationships like this one!
Have the class debrief their findings. Did anyone’s mind change after looking at the data? Is the data convincing or not? Why or why not?
Choose Your Dataset
Overview
Learning Objectives
Evidence Statements
Product Outcomes
Materials
Preparation
Choose Your Dataset
(Time 15 minutes)
Now that you’ve had a chance to look at a few tables, it’s time to choose a dataset of your own! Throughout this course, you’ll be analyzing this dataset and writing up your findings. As you learn new tools for data science, you’ll continue to refine this analysis, answering questions and raising new ones of your own!
Take 5 minutes to look through the following datasets, and choose one that interests you:
Or find your own dataset, and use this (Blank Starter file) for your project.
Once you’ve found a Starter file for a dataset that interests you, click "Save a Copy" and save the project to your own account.
You’ll be adding to this file as we go, and keeping a written repository of your work and your findings as well. We’ve created a starter file for this Report as well - save a copy to your account, and be sure to bookmark the page so you can return to it later.
Take 5 minutes to fill in your name, and answer questions 1-3 in your Report.
See the Sample Report to see an exemplar of student work.
Closing
Overview
Learning Objectives
Evidence Statements
Product Outcomes
Materials
Preparation
Closing
(Time 5 minutes)
Congratulations! You’ve just learned the basics of the Pyret programming language, and how to use that language to answer a data science question. Throughout this course, you’ll learn new and more powerful tools that will allow you to answer more complex questions, and in greater detail.
Make sure to save your work. Hit the Save button in the top left. This will save your program in the code.pyret.org folder within your Google Drive.
If your students are working in pairs/groups, make sure that each student has access to a version of the program. The student who saved the program to their Google Drive can share their program with anyone by hitting the Publish button in the top left, choosing "Publish a new copy", then clicking the "Share Link" option. This will allow them to copy a link to the program, then send to their partners in an email/message.