Referenced from lesson Randomness and Sample Size (Spring, 2021)
1 In the Definitions Area of the Expanded Animals Starter File, define the following samples:
tiny-sample = random-rows(big-animals-table, 10)
small-sample = random-rows(big-animals-table, 20)
medium-sample = random-rows(big-animals-table, 40)
large-sample = random-rows(big-animals-table, 80)
2 Click run and make a pie-chart
of the species in the tiny-sample
.
-
What animals are in the sample?
-
Click run for a new random sample and make another pie-chart of species in the
tiny-sample
. What animals are in the sample? -
Click run for a new random sample and make another pie-chart of species in the
tiny-sample
. Based on these samples, how many species of animals do you think are at the shelter? -
Which species do you think there are the most of at the shelter?
3 What did you learn from taking multiple samples that you wouldn’t have known if you’d only taken a single sample?
4 Now use small-sample
to make a pie-chart
of the species.
-
What animals are in the sample?
-
Click run for a new random sample and make another pie-chart of species in the
small-sample
. What animals are in the sample?
5 Now that you’ve seen small-sample
, how has your sense of the distribution of the species changed?
6 Now use medium-sample
to make a pie-chart
of the species. If there are about 400 animals at the shelter, how many of each species would you predict there to be.
7 Now use large-sample
to make a pie-chart
of the species. If there’s anything you’d like to change about your prediction now that you’ve seen large-sample
, record it here.
8 Let’s see how accurate your prediction is… feel free to click run and build a few more pie charts from your samples if you want to collect more information first! When you’re ready, make a pie-chart
of animals-table-2
.
-
Which predictions were closest?
-
Which predictions were off?
-
Were there any surprises?
9 In the real world, we usually don’t have access to a whole dataset to check predictions against! How could we test…
-Every giraffe on the planet? -Everyone who has ever come in contact with a covid-positive person? -Every person who identifies as queer?
What strategies can we use to make sure that predictions from samples are as close to accurate as possible?
These materials were developed partly through support of the National Science Foundation, (awards 1042210, 1535276, 1648684, and 1738598). Bootstrap:Data Science by the Bootstrap Community is licensed under a Creative Commons 4.0 Unported License. This license does not grant permission to run training or professional development. Offering training or professional development with materials substantially derived from Bootstrap must be approved in writing by a Bootstrap Director. Permissions beyond the scope of this license, such as to run training, may be available by contacting contact@BootstrapWorld.org.