instagram

Students learn about Categorical and Quantitative data, are introduced to Tables by way of the Animals Dataset, and consider what questions can and cannot be answered with available data.

Lesson Goals

Students will be able to…​

  • Explain the difference between Categorical and Quantitative data

  • Identify whether a variable in a dataset is Categorical or Quantitative

  • Identify the Header Row and Identifier Column of a Table

Student-facing Lesson Goals

  • Let’s learn about data inside tables.

Materials

Preparation

  • Decide how the first activity (opening questions) will be run. Will questions be printed for each student, group of students, or posted around the room. Note: these are just ideas to get you started. Use questions that you know will interest your students!

  • Decide how students will be grouped in pairs.* You will need a computer for each student (or pair), with access to the internet

  • Each student (or pair of students) should have a Google Account.

  • Make sure student computers can access the Animals Spreadsheet and the Animals Starter File.

  • Students should have Student workbook and something to write with.

Supplemental Resources

Glossary
categorical data

data whose values are qualities that are not subject to the laws of arithmetic.

data row

a structured piece of data in a dataset that typically reports all the information gathered about a given individual

data science

the science of collecting, organizing, and drawing general conclusions from data, with the help of computers

header

the titles of each column of a table, usually shown at the top

identifier column

a column of unique values which identify all the individual rows (e.g. - student IDs, SSNs, etc)

programming language

a set of rules for writing code that a computer can evaluate

quantitative data

number values for which arithmetic makes sense

🔗Introduction 20 minutes

Overview

Students look at opening questions, either at their desks or in a walk around the room. They select a question they are personally interested in, and think about the data required to answer that question. This process draws a direct line between answering questions they care about and the basics of data science.

Launch

  • Give students 2 minutes to choose a question that grabs their attention, and group themselves by question. Ideally, no student will be the only one interested in that question.

  • Have students spend 2 minutes coming up with a hypothesis about what the answer is, and explaining why. Does every student in a single question-grouping have the same answer?

Investigate

  • What information would you collect to answer this question? Give students 5 minutes to think about what information they would need to collect, to find the answer.

Common Misconceptions

Students may lean towards questions about individuals, instead of questions about what’s true for a group of individuals who vary from one to another. For example, instead of wondering what movie gets the highest rating, they should ask what’s the typical rating for movies in a list, or how much those ratings tend to vary.

Synthesize

Have students share back the different data they would gather to answer their questions. For each question, students would likely have to gather many different kinds of data. If we wanted to find out if small schools are better than big schools, for example, we might want to gather data on SAT scores, college acceptance, etc. Each of these is a variable in our dataset: any two schools we look at could vary by each of them.

What is the most popular movie of all time? Is Climate Change real? How long do quarterbacks tend to stay in the league? Is Stop-and-Frisk racially biased? We can’t survey every school in the world, get data on every movie ever made, or every police action - but we can do an analysis for a sample of them, and try to infer something about all of them as a whole. These questions quickly turn into a discussion about data — how you assess it, how you interpret the results, and what you can infer from those results. The process of learning from data is called Data Science. Data science techniques are used by scientists, business people, politicians, sports analysts, and hundreds of other different fields to ask and answer questions about data.

We’ll use a programming language to investigate these questions. Just like any human language, programming languages have their own vocabulary and grammar that you will need to learn. The language you’ll be learning for data science is called Pyret.

🔗Optional: Which Questions can we Answer? 10 minutes

Overview

Datasets are useful for answering questions, but they can’t answer all the questions that we will wonder about for a given topic. In this activity students will look at a small dataset about a cyclist’s training rides and think about how they could use the table to answer the question or why they cannot answer the question from the table.

Launch

Which of you like to ride bikes? What data might you collect about bike rides?

Investigate

Open to What Questions Can You Answer with the Given Data? This page includes a small dataset about a cyclist’s training rides and a set of questions. The data can be used to answer some, but not all, of the questions. With your partner, read each question. If it can be answered with what we know, explain how you could use the table to answer it. If it can’t be answered using the table, explain why not.

Synthesize

Discuss students' findings and check for questions.

🔗Meet the Animals! 25 minutes

Overview

Students explore the Animals Dataset, sharing observations and familiarizing themselves with the idiosyncrasies and patterns in the data. In the process, they learn about Categorical and Quantitative data.

Notice and Wonder Pedagogy

This pedagogy has a rich grounding in literature, and is used throughout this course. In the "Notice" phase, students are asked to crowd-source their observations. No observation is too small or too silly! Students may notice that the animals table has corners, or that it’s printed in black ink. But by listening to other students' observations, students may find themselves taking a closer look at the dataset to begin with. The "Wonder" phase involves students raising questions, but they must also explain the context for those questions. Sharon Hessney (moderator for the NYTimes excellent What’s going on in this Graph? activity) sometimes calls this "what do you wonder…​and why?". Both of these phases should be done in groups or as a whole class, with time given to each.

Launch

Have students open the Animals Spreadsheet in a browser tab, or turn to The Animals Dataset (Page 2) in their Student Workbooks.

Investigate

This table contains data from an animal shelter, listing animals that have been adopted. We’ll be analyzing this table as an example throughout the course, but you’ll be applying what you learn to a dataset you choose as well.

  • Turn to Questions and Column Descriptions (Page 4) in your Student Workbook. What do you Notice about this dataset? Write down your observations in the first column.

  • Sometimes, looking at data sparks questions. What do you Wonder about this dataset, and why? Write down your questions in the second column.

  • There’s a third column, called “Answered by Dataset” — we’re going to return to that later, so you can ignore it for now.

  • If you look at the bottom of the spreadsheet file, you’ll see that this document contains multiple sheets. One is called "pets" and the other is called "README". Which sheet are we looking at?

  • Each sheet contains a table. For our purposes, we only care about the animals table on the "pets" sheet.

Any two animals in our dataset may have different ages, weights, etc. Each of these is called a variable in the dataset.

Data Scientists work with two broad kinds of data: Categorical Data and Quantitative Data. Categorical Data is used to classify, not measure. Categories aren’t subject to the laws of arithmetic. For example, we couldn’t ask if “cat is more than lizard”, and it doesn’t make sense to "find the average ZIP code” in a list of addresses. “Species” is a categorical variable, because we can ask questions like “which species does Mittens belong to?"

What are some other categorical variables you see in this table?

Quantitative Data is used to measure an amount of something, or to compare two pieces of data to see which is less or more. If we want to ask “how much” or “which is most”, we’re talking about Quantitative Data. "Pounds" is a quantitative variable, because we can talk about whether one animal weighs more than another or ask what the average weight of animals in the shelter is.

We use Categorical Data to answer “what kind?”, and Quantitative Data to answer "how much?".

Synthesize

Have students share back their noticings (statements) and wonderings (questions), and write them on the board.

Data Science is all about using a smaller sample of data to make educated guesses about a larger population. It’s important to remember that tables are only a sample of a larger population: this table describes some animals, but obviously it isn’t every animal in the world! Still, if we took the average age of the animals from this particular shelter, it might tell us something about the average age of animals from other shelters.

🔗Meet Pyret! 10 minutes

Overview

Students open up the Pyret environment (code.pyret.org, or "CPO") and see the Animals Dataset reflected there.

Launch

Let’s take a look at our programming environment, and see what the Animals Dataset looks like there.

Open the Animals Starter File in a new tab. Click “Connect to Google Drive” to sign into your Google account. This will allow you to save Pyret files into your Google Drive.

Next, click the "File" menu and select "Save a Copy". This will save a copy of the file into your own account, so that you can make changes and retrieve them later.

Click "Run" to tell Pyret to read the code on the left-hand side. Anytime something on the left changes, we need to click "Run" to give Pyret the hint that something has changed.

Investigate

  • On the right-hand side, type animals-table and hit the "Enter" or "Return" key.

  • What happens?

  • Look on the left-hand side of the screen. Where is Pyret getting animals-table from?

The first few lines on the lefthand side of the screen tell Pyret to import files from elsewhere, which contain tools we’ll want to use for this course. We’re importing a file called Bootstrap:Data Science, as well as files for working with Google Sheets, tables, and images:

include shared-gdrive("Bootstrap-DataScience-...")
include gdrive-sheets
include tables
include image

After that, we see a line of code that defines shelter-sheet to be a spreadsheet. This table is loaded from Google Drive, so now Pyret can see the same spreadsheet you do. (Notice the funny scramble of letters and numbers in that line of code? If you open up the Google Sheet, you’ll find that same scramble in the address bar! That scramble is how the Pyret editor knows which spreadsheet to load.) After that, we see the following code:

# load the 'pets' sheet as a table called animals-table
animals-table = load-table: name, species, sex, age, fixed, legs, pounds, weeks
  source: pets-sheet.sheet-by-name("pets", true)
end

The first line (starting with #) is called a Comment. Comments are notes for humans, which the computer ignores. The next line defines a new table called animals-table, which is loaded from the shelter-sheet defined above. We also create names for the columns: name, species, sex, age, fixed, legs, pounds and weeks. We could use any names we want for these columns, but it’s always a good idea to pick names that make sense!

Even if your spreadsheet already has column headers, Pyret requires that you name them in the program itself.

Every table is made of cells, which are arranged in a grid of rows and columns. The first row and first column are special. The first row is called the header row, which gives a unique name to each variable (or “column”) in the table. The first column in the table is the identifier column, which contains a unique ID for each row. Often, this will be the name of each individual in the table, or sometimes just an ID number.

Below is an example of a table with one header row and two data rows:

name species sex age fixed legs pounds weeks

"Sasha"

"cat"

"female"

1

false

4

6.5

3

"Mittens"

"cat"

"female"

2

true

4

7.4

1

  • How many variables are listed in the header row for the Animals Dataset? What are they called? What is being used for the identifier column in this dataset?

  • Try changing the name of one of the columns, and click "Run". What happens when you try to out the table?

  • What happens if you remove a column from the list? Or add an extra one?

After the header, Pyret tables can have any number of data rows. Each data row has values for every column variable (nothing can be left empty!). A table can have any number of data rows, including zero, as in the table below:

name species sex age fixed legs pounds weeks

Pyret lets us use many different kinds of data. In the animals table, for example, there are Numbers (the number of legs each animal has), Strings (the species of the animal), and Booleans (whether it is true or false that an animal is fixed).

Synthesize

Once you know how to program, you can do a lot with datasets:

  • Data Scientists display tables as all kinds of charts and graphs. For example, we might want to make a pie chart showing how many animals of each species we have.

  • Sometimes they want to filter a table, showing only a few of the rows. For example we might only want to look at animals where species is equal to "dog".

  • Or perhaps we want to build a column! For example, there could be a vaccination for all cats under the age of 3, and we want to add a vaccinate column that says true or false for animal.

In this course, you’ll be learning to do all three in Pyret: Display, Filter, and Build.

What are some other examples each?

These materials were developed partly through support of the National Science Foundation, (awards 1042210, 1535276, 1648684, and 1738598). CCbadge Bootstrap:Data Science by the Bootstrap Community is licensed under a Creative Commons 4.0 Unported License. This license does not grant permission to run training or professional development. Offering training or professional development with materials substantially derived from Bootstrap must be approved in writing by a Bootstrap Director. Permissions beyond the scope of this license, such as to run training, may be available by contacting contact@BootstrapWorld.org.