1 In the Definitions Area of the Expanded Animals Starter File, define the following samples:
tiny-sample = random-rows(more-animals, 10)
small-sample = random-rows(more-animals, 20)
medium-sample = random-rows(more-animals, 40)
large-sample = random-rows(more-animals, 80)
2 Click "Run" and make a pie-chart
of the species in the tiny-sample
. What animals are in the sample?
Click "Run" for a new random
, and make another pie-chart for species. What animals are there? -
Click "Run" for a new random sample, and make yet another pie-chart for species. Based on these 3 samples, how many species do you think are at the shelter?
Which is the most common species at the shelter?
3 What did you learn from taking multiple samples that you wouldn’t have known if you’d only taken one?
4 Repeat the steps above, but for small-sample
. What animals are in the sample?
5 Now that you’ve seen small-sample
, how has your sense of the distribution of the species changed?
6 Now use medium-sample
to make a pie-chart
of the species. If there are about 400 animals at the shelter, how many of each species would you predict there to be?
7 Now use large-sample
to make a pie-chart
of the species. If there’s anything you’d like to change about your prediction now that you’ve seen large-sample
, record it here.
8 Let’s see how accurate your prediction is… feel free to click "Run" and build a few more pie charts from your samples if you want to collect more information first! When you’re ready, make a pie-chart
of more-animals
Which predictions were closest?
Which predictions were off?
Were there any surprises?
9 In the real world, we usually don’t have access to a whole dataset to check predictions against! How could we test…
Every giraffe on the planet?
Everyone who has ever come in contact with a covid-positive person?
Every person who identifies as queer?
What strategies can we use to make sure that predictions from samples are as close to accurate as possible?
