Intro to CODAP & Displaying Categorical Data
Intro to CODAP & Displaying Categorical Data
(Also available in CODAP)
Students learn to generate and compare bar graphs, explore other plotting & display functions in CODAP, and (optionally) design an infographic.
Lesson Goals |
Students will be able to:
|
Student-facing Lesson Goals |
|
Materials |
|
Preparation |
|
Supplemental Resources |
Welcome to CODAP! 10 minutes
Overview
Students use the options on CODAP’s Configuration menu to produce displays and plots of the Animals Dataset.
Launch
Where have you seen infographics and graphs used to display data in the real world?
Open the Animals Dataset in CODAP.
-
Click the
graph
icon from the horizontal toolbar in the upper left. (See toolbar, below.) What appears?
-
Select a dot with your mouse. What happens?
-
Try generating the graph again. Does it look the same or different? Why might this be relevant?
-
What happens when you select a table row? How about multiple table rows?
-
What happens when you click the "eye" icon (to the right of the graph or the table, depending on which is selected)?
If students report that a blank graph appears (rather than a scatterplot), prompt them to whitelist CODAP on their ad-blocker. Ad-blockers do seem to inhibit some of the functionality of CODAP (which will fortunately never advertise to users!).
Initially, the data points are randomly distributed on the graph. Selecting an orange dot reveals the name
of that particular animal. Selecting a particular dot causes the table row for that animal to be highlighted in blue. Holding the shift button allows students to select multiple dots in the graphical display, or multiple rows in the table.
Students should observe that when they select a table row (or multiple table rows), the corresponding dots change from orange to turquoise. When they set aside selected and / or unselected cases (by using the "eye" icon), they can temporarily alter the amount of pets in the dataset (with the option to restore to the original dataset).
Students can also resize the window by dragging its borders.
Investigate
Once we have a graph of randomly distributed data points, we can organize the data by selecting attributes from our table that we want to appear on the axes of our graph.
Experiment with creating some bar charts in CODAP by following the steps below.
-
Select the y-axis on your graph (where it says "Click here"). On the drop-down menu that appears, select
fixed
. What do you notice? -
Now select the x-axis on your graph and select
fixed
. How does the graph change? -
Select the
configuration
icon (which looks like a bar graph) to the right of the data display. Selectfuse dots into bars
. -
Click the
ruler
icon; test out each of the two options available (count and percentage). What happens? -
Now, make a bar chart showing how many animals there are of each species by changing the variable on the x-axis to
species
. -
Experiment with bar charts, either by clicking on the axis title to display a menu of attributes or dragging column title from the table to the axes.
-
Which types of attributes can be displayed with the dots fused into bars? For which types of attributes does CODAP instead offer to create a bar for each point?
To dig deeper into bar charts, have students turn to Bar Chart - Notice and Wonder.
People aren’t Hermaphrodite?
When students make a display of the |
Common Misconceptions
Bar charts look a lot like another kind of chart - called a "histogram" - which are actually quite different because they display quantitative data, not categorical. Making a histogram in CODAP, however, begins quite similarly to making a bar chart in CODAP - by creating a dot plot that will be modified.
Synthesize
Bar charts display how much of the sample belongs to each category. If they are based on sample data from a larger population, we use them to infer the proportion of a whole population that might belong to each category.
Bar charts are mostly used to display categorical columns.
While bars in some bar charts should follow some logical order (alphabetical, small-medium-large, etc), they can technically be placed in any order, without changing the meaning of the chart.
Mini Project: Making Infographics Infographics are a powerful tool for communicating information, especially when made by people who actually understand how to connect visuals to data in meaningful ways. Making Infographics is an opportunity for students to become more flexible math thinkers while tapping into their creativity. This project can be made on the computer or with pencil and paper. There’s also an Infographics Rubric to highlight for you and your students what an excellent infographic includes. |
Exploring other Displays 30 minutes
Overview
Students freely explore the CODAP data display options available to them when they select the bar graph icon (also known as the Configuration menu). In doing so, they experiment with new charts and get comfortable with CODAP as a platform for doing data science.
Launch
There are lots of different kinds of charts and plots. Even if you don’t know what these plots are for yet, see if you can figure out how to use them.
-
Open a scatterplot of randomly positioned points derived from the Animals Dataset by clicking on the
graph
icon again. -
Now, drag or select the
Weight
attribute/column to the X-axis. -
Select the little icon that looks like a ruler; it is the
Measure
icon. Try producing a box plot by selecting the appropriate option from this menu. -
With
Weight
still on the X-axis, drag or selectTime to Adoption
to the Y-axis. -
Take another look at the
Measure
menu, now that your data display is two-dimensional. Which options do you see that did not appear before? -
Try producing a least squares line (also known as a regression line) by using the
Measure
menu.
Investigate
Complete Exploring Displays and (More) Exploring Displays.
Common Misconceptions
There are many possible misconceptions about displays that students may encounter here. But that’s ok! Understanding all those other plots is not a learning goal for this lesson. Rather, the goal is to have them develop some loose familiarity.
Synthesize
Today you’ve added more data displays to your toolbox. You can create bar charts to visually display data, and even transform entire tables!
You will have many opportunities to use these concepts in this course, by applying what you’ve learned to answer data science questions.
Additional Exercises:
These materials were developed partly through support of the National Science Foundation,
(awards 1042210, 1535276, 1648684, and 1738598).
Bootstrap by the Bootstrap Community is licensed under a Creative Commons 4.0 Unported License. This license does not grant permission to run training or professional development. Offering training or professional development with materials substantially derived from Bootstrap must be approved in writing by a Bootstrap Director. Permissions beyond the scope of this license, such as to run training, may be available by contacting contact@BootstrapWorld.org.
Transformers
Transformers
(Also available in CODAP)
Students learn about Transformers, which allow them to order, filter, and build columns to extend the animals table.
Lesson Goals |
Students will be able to…
|
Student-facing Lesson Goals |
|
Materials |
|
Preparation |
|
Sorting Tables, Part 1 15 minutes
Overview
Students sort a column on a table by actually changing the table.
Launch
Open the Animals Dataset in CODAP. How many animals' names start with the letter M? What would make it easier to figure out how many animals' names start with the letter M?
Investigate
Student work in groups or pairs. They will sort various columns on the table by clicking the attribute name, and selecting from the drop-down menu either Sort Ascending
or Sort Descending
.
-
Let’s sort the animals alphabetically by name and see if we get the same answer as we did before!
-
Sort the animals by age, from youngest to oldest.
-
Sort the animals from heaviest to lightest.
-
Sort the animals table by how long it took for each animal to be adopted, in ascending order.
Synthesize
-
Can you think of a situation when it would be useful to sort animals from heaviest to lightest?
-
Does sorting
animals-table
produce a new table, or change the existing one? How do you know? -
Can you think of a situation where CODAP’s default behavior - to alter the table each time we sort it - might inhibit more sophisticated data analysis?
Sorting Tables, Part 2 10 minutes
Overview
Using the Sort
Transformer, students sort rows of a table in ascending or descending order, according to one column. When they use transformers, students are building and modifying a copy of the original table.
Launch
The Transformers plugin allows students to transform datasets to produce new, distinct datasets, rather than modifying the original input dataset itself.
First, show students how they can access the Transformers plugin (see screenshot on right), and invite them to explore the different Transformers available.
Explain to students that the Sort
Transformer consumes: (1) a dataset; (2) a formula; (3) the type that the formula evaluates to; and (4) a sort direction (ascending or descending). But what does it produce?
What’s a Contract?! Contracts help us keep track of the different Transformers we’ll be using, and how they operate on each Row of a table. Every contract has three imporant parts:
Check out the screenshot of the Transformers plugin below. What are the domain and range for The Transformer |
Investigate
-
Open the
Transformer
plugin, and choose the transformerSort
. Selectanimals-dataset
. In the formula expression box, typeName
. Selectascending
as the direction andApply Tranformer
. What happens? -
Next, see what happens when you select
descending
. -
Sort the animals table from heaviest to lightest.
-
Sort the animals table alphabetically by species.
-
Sort the animals table by how long it took for each animal to be adopted, in ascending order.
Common Misconceptions
Students may be more familiar with filters that actually change the table. In CODAP, all transformers produce a brand new table. Filtered tables are automatically saved; CODAP titles each new table with a number in curly braces at the end (for example, Filter(Animals-Dataset) {1}
) to indicate how many times the transformer has been applied. When students apply a transformer, they have the option of selecting the original table from the dropdown menu, or a new table that they’ve generated. Students can also rename saved tables, if they’d like.
Synthesize
-
Does the transformer
Sort
produce a new table, or change the existing one? -
You’ve now learned two different strategies for sorting a column of a table. What do the two strategies have in common? How are they different?
Filtering Tables 20 minutes
Overview
Students learn how to filter tables by removing Rows.
Launch
Explain to students that you have "Function Cards", which describe the purpose statement of a function that consumes a Row from a table of students, and produces a Boolean (e.g. - "this student is wearing glasses"). Select a volunteer to be the "Filter Transformer", and have them randomly choose a function card from from the Function Cards set, and make sure they read it without showing it to anyone else.
Have 6-8 students line up in front of the classroom, and have the filter transformer go to each student and say "stay" or "sit" depending on whether their function would return true or false for that student. If they say "sit", the student sits down. If they say "stay", the student stays standing.
Ask the class: based on who sat and who stayed, what function was on the card?
The Filter
Transformer takes a dataset and produces a copy of it that contains only the cases for which the given formula evaluates to true.
Suppose we want to get a table of only animals that have been fixed? The Filter
transformer consumes a dataset to filter and a formula expression that evaluates to either true or false.
Investigate
-
Open the
Transformer
plugin, and choose the transformerFilter
. Selectanimals-dataset
. In the formula expression box, typeFixed = “TRUE”
. Apply the transformer. What happens? -
Does CODAP mind if you forget to capitalize? What about if you leave out quotation marks? Examine the error messages that appear if you are just a little careless as you enter text into the formula expression box.
-
This time, in the formula expression box, type
Age > 5
. What did you get? -
Now try
Species = “dog”
The Filter
Transformer walks through the table, applying whatever formula it was given to each row, and produces a new table containing all the rows for which the formula returned true
. Notice that Filter
takes a dataset and produces a copy of it that contains only the cases for which the given formula evaluates to true. If it consumes anything besides a single Row
, or if it produces anything else besides a Boolean
, we’ll get an error.
Synthesize
Debrief with students. Some guiding questions on filtering:
-
Suppose we wanted to determine whether cats or dogs get adopted faster. How might using the
Filter
transformer help? -
If the shelter is purchasing food for older cats, what filter would we write to determine how many cats to buy for?
-
Can you think of a situation where filtering fixed animals would be helpful?
Building Columns 10 minutes
Overview
Students learn how to build columns, using the Build Attribute
transformer.
Launch
Suppose we want to transform our table, converting pounds
to kilograms
or weeks
to days
. The Build Attribute
transformer makes a new copy of a dataset, and adds a new attribute. We must provide a dataset, a name for the new attribute, an existing collection to add the attribute to, a formula for the attribute’s values, and an indication of the type of value the formula will evaluate to.
Investigate
-
Open the
Transformer
plugin, and choose the transformerBuild Attribute.
Selectanimals-dataset
. -
Enter
Young
as theName of New Attribute
. Selectcases
as theCollection to Add To
. In the formula expression box, typeAge < 5
. Apply the transformer. What happens? -
Now, enter
is-cat
as theName of New Attribute
and try typingSpecies = “cat”
in the formula expression box. What do you get? What do you think is going on?
The Build Attribute
Transformer walks through the table, applying whatever formula expression it was given to each row. Whatever the formula expression produces for that row becomes the value of our new column, which is named based on the string it was given. In the first example, we gave it Age < 5
, so the new table had an extra Boolean column for every animal, indicating whether or not it was young.
Synthesize
Debrief with students. Ask them if they can think of a situation where they would want to use this. Some ideas:
-
A dataset from Europe might list everything in metric (centimeters, kilograms, etc), so we could build a column to convert that to imperial units (inches, pounds, etc).
-
A dataset about schools might include columns for how many students are in the school and how many of those students identify as multi-racial. But when comparing schools of different sizes, what we really want is a column showing what percentage of students identify as multi-racial. We could use to compute that for every row in the table.
Being able to define is a huge upgrade in our ability to analyze data! But as a wise person once said, "with great power comes great responsibility"! Dropping all the dogs from our dataset might be a cute exercise in this class, but suppose we want to drop certain populations from a national census? Even a small programming error could erase millions of people, impact funding for things like roads and schools, etc.
Additional Exercises:
These materials were developed partly through support of the National Science Foundation,
(awards 1042210, 1535276, 1648684, and 1738598).
Bootstrap by the Bootstrap Community is licensed under a Creative Commons 4.0 Unported License. This license does not grant permission to run training or professional development. Offering training or professional development with materials substantially derived from Bootstrap must be approved in writing by a Bootstrap Director. Permissions beyond the scope of this license, such as to run training, may be available by contacting contact@BootstrapWorld.org.
Method Chaining
Method Chaining
(Also available in CODAP)
Students learn how to chain Methods together, and define more sophisticated subsets.
Lesson Goals |
Students will be able to…
|
Student-facing Lesson Goals |
|
Materials |
|
Preparation |
|
Design Recipe Practice 25 minutes
Overview
Students practice more of what they learned in the previous lesson, applying the Design Recipe to make table functions that operate on rows of the Animals Dataset. These become the basis of the chaining activity that follows.
Launch
The Design Recipe is a powerful tool for solving problems by writing functions . It’s important for this to be like second nature, so let’s get some more practice using it!
Investigate
Define the Compute functions on The Design Recipe: is-dog / is-female and The Design Recipe: is-old / name-has-s.
Optional: Combining Booleans Suppose we want to build a table of Animals that are fixed and old, or a table of animals that are cats or dogs? By using the For many of the situations where you might use |
Synthesize
Did students find themselves getting faster at using the Design Recipe? Can students share any patterns they noticed, or shortcuts they used?
Chaining 25 minutes
Overview
Students learn how to compose multiple table operations (sorting, filtering, building) on the same table - a technique called "chaining".
Launch
Now that we are doing more sophisticated analyses, we might find ourselves writing the following code:
# get a table with the nametags of all the fixed animals, ordered by species
with-labels = animals-table.build-column("labels", nametag)
fixed-with-labels = with-nametags.filter(is-fixed)
result = fixed-with-labels.order-by("species", true)
That’s a lot of code, and it also requires us to come up with names for each intermediate step! Pyret allows table methods to be chained together, so that we can build, filter and order a Table in one shot. For example:
# get a table with the nametags of all the fixed animals, ordered by species
result = animals-table.build-column("labels", nametag).filter(is-fixed).order-by("species", true)
This code takes the animals-table
, and builds a new column. According to our Contracts Page, .build-column
produces a new Table, and that’s the Table whose .filter
method we use. That method produces yet another Table, and we call that Table’s order-by
method. The Table that comes back from that is our final result.
Teaching Tip Use different color markers to draw nested boxes around each part of the expression, showing where each Table came from. |
It can be difficult to read code that has lots of method calls chained together, so we can add a line-break before each “.
” to make it more readable. Here’s the exact same code, written with each method on its own line:
# get a table with the nametags of all the fixed animals, order by species
animals-table
.build-column("label", nametag)
.filter(is-fixed)
.order-by("species", true)
Order matters: Build, Filter, Sort.
Suppose we want to build a column and then use it to filter our table. If we use the methods in the wrong order (trying to filter by a column that doesn’t exist yet), we might wind up crashing the program. Even worse, the program might work, but produce results that are incorrect!
Investigate
When chaining methods , it’s important to build first, then filter, and then order.
How well do you know your table methods? Complete Chaining Methods and Chaining Methods 2: Order Matters in your Student Workbook to find out.
Synthesize
As our analysis gets more complex, chaining is a great way to re-use work we’ve already done. And less duplicate work means a smaller chance of bugs. Chaining is a powerful way to work, so it’s critical to think carefully when we use it!
Additional Exercises
These materials were developed partly through support of the National Science Foundation,
(awards 1042210, 1535276, 1648684, and 1738598).
Bootstrap by the Bootstrap Community is licensed under a Creative Commons 4.0 Unported License. This license does not grant permission to run training or professional development. Offering training or professional development with materials substantially derived from Bootstrap must be approved in writing by a Bootstrap Director. Permissions beyond the scope of this license, such as to run training, may be available by contacting contact@BootstrapWorld.org.
Randomness and Sample Size
Randomness and Sample Size
(Also available in CODAP)
Students learn about random samples and statistical inference, as applied to the Animals Dataset. In the process, students get a light introduction to the role of sample size and the importance of statistical inference.
Lesson Goals |
Students will be able to…
|
Student-facing Lesson Goals |
|
Materials |
|
Preparation |
|
- statistical inference
-
using information from a sample to draw conclusions about the larger population from which the sample was taken
Flip the Script: Inference v. Probability 45 minutes
Overview
Statistical inference involves looking at a sample and trying to infer something you don’t know about a larger population. This requires a sort of backwards reasoning, kind of like making a guess about a cause, based on the effect that we see. To better understand the process of going from the sample back to the population, it helps to understand the more straightforward process of going from the population to a sample. If the sample is random, we call this process Probability!
In real life we typically don’t know what’s true for an entire population. But this probability thought-experiment will start with a larger population with known properties (such as the fact that nearly half of the entire population are males). Then we’ll see what kind of behavior we tend to see in random samples taken from that population.
Launch
Inference Reasons Backwards; Probability Reasons Forwards
One of the most useful tasks in Data Science is using sample data to infer (guess) what’s true about the larger population from which the sample was taken. This process, called statistical inference, is used to gain information in practically every field of study you can imagine: medicine, business, politics, history; even art! Early on, statisticians discovered that random samples almost always work best.
Suppose we want to estimate what percentage of all Americans plan to vote for a certain candidate. We can’t ask everyone who they’re voting for, so pollsters instead take a sample of Americans, and generalize the opinion of the sample to estimate how Americans as a whole feel. But choosing a sample can be tricky…
-
Would it be problematic to only call voters who are registered Democrats? To only call voters under 25? To only call regular churchgoers? Why or why not?
-
How could we choose a representative subset, or sample of American voters?
-
Would it be problematic to only sample a handful of voters? What do we gain by taking a larger sample?
Before we infer something unknown about a population from a sample, we need to know what makes a "good" sample!
Sampling is a complicated issue. The main reason for doing inference is to guess about something that’s unknown for the whole population. But a useful step along the way is to practice with situations where we happen to know what’s true for the whole population. As an exercise, we can keep taking random samples from that population and see how close they tend to get us to the truth. Another discovery (besides the value of randomness) that statisticians made early on was something that’s perfectly consistent with common sense: Larger samples are better than smaller ones, because they tend to get us closer to the truth about the whole population.
Let’s see what happens if we switch from smaller to larger sample sizes, if we’re taking a random sample of shelter animals to infer what’s true about the larger population…
Students should log into CPO open the Expanded Animals Starter File (Pyret), and save a copy.
Investigate
The Animals Dataset we’ve been using is just one sample taken from a very large animal shelter. How much can we infer about the whole population of hundreds of animals, by looking at just this one sample?
-
Divide the class into groups of 3-5 students.
-
Have students open the Expanded Animals Starter File (Pyret), and click "Run".
-
Have students complete Sampling and Inference, sharing their results and discussing with the group.
-
For a deeper exploration of the impact of sample size, have students complete Predictions from Samples
Common Misconceptions
Many people mistakenly believe that larger populations need to be represented by larger samples. In fact, the formulas that Data Scientists use to assess how good a job the sample does is only based on the sample size, not the population size.
Extension In a statistics-focused class, or if appropriate for your learning goals, this is a great place to include more rigorous statistics content on sample size, sampling bias, etc. |
Synthesize
Have students share. Were larger samples always better for guessing the truth about the whole population? If so, how much better?
Project Options: Food Habits / Time Use In both of these projects, students gather data about their own lives and use what they’ve learned in the class so far to analyze it. This project can be used as a mid-term or formative assessment, or as a capstone for a limited implementation of Bootstrap:Data Science. See the project descriptions for Food Habits and Time Use. (Based on the projects of the same name from IDS at UCLA) |
These materials were developed partly through support of the National Science Foundation,
(awards 1042210, 1535276, 1648684, and 1738598).
Bootstrap by the Bootstrap Community is licensed under a Creative Commons 4.0 Unported License. This license does not grant permission to run training or professional development. Offering training or professional development with materials substantially derived from Bootstrap must be approved in writing by a Bootstrap Director. Permissions beyond the scope of this license, such as to run training, may be available by contacting contact@BootstrapWorld.org.
Grouped Samples
Grouped Samples
(Also available in CODAP)
Students practice creating subsets and think about why it might sometimes be useful to answer questions about a dataset through the lens of specific subsets.
Lesson Goals |
Students will be able to…
|
Student-facing Lesson Goals |
|
Materials |
|
Preparation |
|
- grouped sample
-
a non-random subset of individuals chosen from a larger set, where the individuals belong to a specific group
Problems with a Single Population 10 minutes
Overview
This activity is all about grouped samples: Students make a bunch of subsets from the Animals Dataset, and see how each subset might answer the same question differently.
Launch
When looking at a scatter plot of our animals, it looks like the amount an animal weighs may have something to do with how long it takes to be adopted.
🖼Show image
But if we label the dots by animal, we notice every data point after 25 pounds belongs to a dog from the shelter! The cats are all clumped together in the lower weight range, making it hard to see how weeks to adoption may relate to a cat’s weight.
Investigate
Divide the class into groups of 3-4, with one student identified as the "reporter".
-
Looking at this scatterplot, does it make sense to analyze all the animals together? Why or why not?
-
Are there some questions where it would be important to break up the population into species-specific populations? What are they?
-
Are there some questions where it would be important to keep the whole population together? What are they?
Synthesize
Have the reporters share their findings with the class.
Imagine that you’ve been handed a dataset from a country where half the people are wealthy and have access to amazing medical care, and the other half are poor and have no healthcare. If we took a random sample of the population as a whole, we might think that they are generally middle-income and have average health. But if we ask the same question about the two groups separately, we would discover inequality hiding in plain sight!
Grouped Samples 20 minutes
Launch
Ultimately, it might make more sense to ask certain questions about "just the cats" or "just the dogs". Averaging every animal together will give us an answer, but they may not be useful answers.
Sometimes important facts about samples get lost if we mix them with the rest of the population!
Data Scientists define grouped samples of datasets, breaking them up into sub-groups that may be helpful in their analysis.
Earlier, you learned how to define values in Pyret. We can define Numbers, Strings, Images, and even rows:
name = "Flannery"
age = 16
logo = star(50, "solid", "red")
sasha= animals-table.row-n(0)
Let’s use this skill to define Tables…
Investigate
We already know how to define values, and how to filter a dataset. So let’s put those skills together to define a grouped sample of the dogs in the shelter:
dogs = animals-table.filter(is-dog)
A “kitten” is an animal who is a cat and who is young. How would you define a table of just kittens?
-
Turn to Grouped Samples from the Animals Dataset, and see what code will compute whether or not an animal is a kitten.
-
Can you fill in the code for the other grouped samples?
-
When you’re done, type these definitions into the Definitions Area.
-
Make a bar chart showing the distribution of
sex
in thekittens
subset , by typingbar-chart(kittens, "sex")
. -
Make bar charts showing the
sex
column for every grouped sample. Which one best represents the distribution of species for the whole population? Why?
Synthesize
Debrief with students. Thoughtful question: how could we filter and sort a table? How can we combine methods?
Displaying Samples 20 minutes
Overview
Students revisit the data display activity, now using the samples they created.
Launch
Making grouped and random samples is a powerful skill to have, which allows us to dig deeper than just making charts or asking questions about a whole dataset. Now that we know how to make subsets, we can make much more sophisticated displays!
Investigate
Complete Displaying Data, using what you’ve learned about samples to make more sophisticated data displays.
Synthesize
Were any of the students' displays interesting or surprising? Given a novel question, can students identify what helper functions they would need to write?
These materials were developed partly through support of the National Science Foundation,
(awards 1042210, 1535276, 1648684, and 1738598).
Bootstrap by the Bootstrap Community is licensed under a Creative Commons 4.0 Unported License. This license does not grant permission to run training or professional development. Offering training or professional development with materials substantially derived from Bootstrap must be approved in writing by a Bootstrap Director. Permissions beyond the scope of this license, such as to run training, may be available by contacting contact@BootstrapWorld.org.