(Also available in CODAP)

Students explore sampling and probability as a mechanism for detecting patterns. After exploring this in a binary system (flipping a coin), they consider the role of sampling as it applies to relationships in a dataset.

Lesson Goals

Students will be able to…​

  • Understand the connection between probability and inference

  • Understand the need for random samples

  • Understand the role of sample size

  • Take random samples from a population

Student-facing Lesson Goals

  • Let’s explore what random sampling has to do with seeing trends



prejudice in favor of or against one outcome, person, or group compared with another, usually in a way considered to be unfair.

null hypothesis

the hypothesis that there is no significant difference between specified populations, any observed difference being due to sampling or experimental error.

random sample

a subset of individuals chosen from a larger set, such that each individual has the same probability of being chosen

sample size

the number of participants or observations included in a study

statistical inference

using information from a sample to draw conclusions about the larger population from which the sample was taken

🔗How to Spot a Scam 10 minutes


Students consider a classic randomness scenario: the probability that a coin will land on heads or tails. From a data science perspective, this can be flipped from a discussion of probability to one of inference. Specifically, "how many samples do we need, to determine whether a coin is fair or not?"


110🖼Show image A stranger on the street invites you to a game of chance. They’ll flip a coin, and you’ll win money if you can predict whether it lands heads-or-tails. If you guess wrong, however, you pay THEM.

"It’s a pure game of chance", they tell you, "we each have equal odds of winning".

  • What do you think? Can you trust them to play fair?

  • For a fair coin, what are the chances of it landing heads? Tails?

    • A fair coin has a 50% chance of landing heads and a 50% chance of landing tails.

  • How do you know if a coin is fair or not?

    • Flip it! The more flips you make, the more accurately you can assess if it is fair or not.


A fair coin should land on "heads" about as often as it lands on "tails".

When we approach a strange coin, we start out assuming that it’s not biased towards heads or tails - that it will land on both "heads" and "tails" about 50% of the time. This is called the null hypothesis. A weighted coin, on the other hand, might be heavier on one side, creating a bias toward heads or tails more often! But how do we test whether a coin is fair or biased? How do we test the null hypothesis?

Have students share back their sample results, and their predictions after 5 samples and then 20 samples.

Which samples seem to support the null hypothesis? Which ones undermine it?

In Statistics and Data Science, samples like these don’t prove any claim about the coins! Instead, they either produce evidence for or against a claim. The larger the sample, the more evidence we have to support or reject the null hypothesis.

The chances of getting tails from a fair coin three times in a row are pretty good: one-in-8! Maybe it was just the luck of the draw, and the coin is still fair.

But what are the chances of flipping "head" 10 times in a row? 100 times? We might say "There’s a one in a million chance of a fair coin coming up heads 100 times. No way is this coin fair!"

But of course, there is a way. It’s just…​incredibly unlikely.

Going Deeper: p-value

Statisticians would use more formal language to describe this highly unlikely, "one in a million chance". They call it the p-value, and use a decimal to represent the chance that a pattern is found in a sample, when no such pattern exists in the population.

Most people say…​ Statisticians say…​

"There’s a 1-in-10 chance of this"

"The p-value is 0.1"

"There’s a 1-in-100 chance of this"

"The p-value is 0.01"

"There’s a 2-in-100 chance of this"

"The p-value is 0.02"

"There’s a one-in-a million chance"

"The p-value is 0.000001"

Common Misconceptions

Students may think that any sample from a fair coin should have an equal number of heads and tails outcomes. That’s not true at all! A fair coin might land on "tails" three times in a row! The fact that this is possible doesn’t mean it’s likely. Landing on "tails" five times in a row? Still possible, but much less likely.

This is where arithmetic thinking and statistical thinking diverge: it’s not a question of what is possible, but rather what is probable.


  • Suppose we are rolling a 6-sided die. How could we tell if it’s weighted or not?

    • We could record how many times the die landed on each number. If the die is fair, we should see that it lands on each number approximately equally.

  • Could a coin come up "heads" twice in a row, and still be a fair coin? Why or why not? What about 10 times in a row? 20?

    • The coin could be fair in all of these intsances! Heads 20 times in a row, however, is extremely unlikely.

  • What is the relationship between how weighted a coin is, and how many samples you need to figure it out?

    • A fair coin should land on heads about 50% of the time. If a coin landed on heads _100% of the time, it wouldn’t take long to figure out that something was up! A trick coin that comes up heads 60% of the time, however, would need a much larger sample to be detected. The smaller the bias, the larger the sample we need to see it. A small bias might be enough to guarantee that a casino turn a profit, and be virtually undetectable without a massive sample!_

🔗Probability v. Inference 30 minutes


Statistical inference involves looking at a sample and trying to infer something you don’t know about a larger population. This requires a sort of backwards reasoning, kind of like making a guess about a cause, based on the effect that we see.


In the coin-flip activity, we took samples of each coin and used our intuition about chance and probability to make inferences about whether the coins were fair or weighted.

In other words, we knew how the coin should behave before we even started sampling coin-flips, and then checked to see if the samples matched our expectation.

In statistics, we run the process in reverse: we take a sample and then infer something we didn’t know to begin with.

Inference Reasons Backwards; Probability Reasons Forwards.

Statistical inference is used to gain information in practically every field of study you can imagine: medicine, business, politics, history; even art!

Suppose we want to estimate what percentage of all Americans plan to vote for a certain candidate. We don’t have time to ask every single person who they’re voting for, so pollsters instead take a sample of Americans, and infer how all Americans feel based on the sample.

Just like our coin-flip, we can start out with the null hypothesis: assuming that the vote is split equally. Flipping a coin 10 times isn’t enough to infer whether it’s weighted, and polling 10 people isn’t enough to prove that one candidate is in the lead. But if we survey enough people we can infer something about the whole population.

Sample size matters!

  • We’re taking a survey of religions in our neighborhood. There’s a Baptist church right down the street, so we could get a nice big sample by asking everyone there…​right?

    • Sampling this population would reveal to us that everyone in the neighborhood is Baptist, which might not be the case!

  • Taking a sample of whoever is nearby is called a convenience sample. Why is a convenience sample a problem in this example?

    • Everyone at the church is Baptist, but the entire neighborhood might not be.

  • Would it be problematic to only call voters who are registered Democrats? To only call voters under 25? To only call regular churchgoers? Why or why not?

    • Calling only certain segments of the population will not reveal the way an entire population will vote.

Bad samples can be an accident - or malice!

When designing a survey or collecting data, Data Scientists need to make sure they are working hard to get a good, random sample that reflects the population. Lazy surveys can result in some really bad data! But poor sampling can also happen when someone is trying to hide something, or to oppress or erase a group of people.

  • A teacher who wants the class to vote for a trip to the dinosaur museum might only call on the students who they know love dinosaurs, and then say "well, everyone I asked wanted that one!"

  • A mayor who wants to claim that they ended homelessness could order census-takers to only talk to people in verified home addresses. Since homeless people don’t typically have an address, the census would show no homeless people in the city!

  • A city that is worried about childhood depression could survey children to ask about their mood…​but only conduct the survey at an amusement park!

Can you think of other examples where biased sampling has been used - or could be used - to harm people?


The main reason for doing inference is to guess about something that’s unknown for the whole population.

A useful step along the way is to practice with situations where we happen to know what’s true for the whole population. As an exercise, we can keep taking random samples from that population and see how close they tend to get us to the truth.

The Animals Dataset we’ve been using is just one sample taken from a very large animal shelter.

How much can we infer about the whole population of hundreds of animals, by looking at just this one sample?

Let’s see what happens if we switch from smaller to larger sample sizes.

Divide the class into groups of 3-5 students.

The most accute samples are random, and large!

Common Misconceptions

Many people mistakenly believe that larger populations need to be represented by larger samples. In fact, the formulas that Data Scientists use to assess how good a job the sample does is only based on the sample size, not the population size.


In a statistics-focused class, or if appropriate for your learning goals, this is a great place to include more rigorous statistics content on sample size, sampling bias, etc.


  • Were larger samples always better for guessing the truth about the whole population? If so, how much better?

  • Why is taking a random sample important for avoiding bias in our analyses?

Project Options: Food Habits / Time Use

Food Habits and Time Use are both projects in which students gather data about their own lives and use what they’ve learned in the class so far to analyze it. These projects can be used as a mid-term or formative assessment, or as a capstone for a limited implementation of Bootstrap:Data Science. Both projects also require that students break down tasks and follow a timeline - either individually or in groups. Rubrics for assessing the projects are linked in the materials section at the top of the lesson.

(Based on the projects of the same name from IDS at UCLA)

These materials were developed partly through support of the National Science Foundation, (awards 1042210, 1535276, 1648684, and 1738598). CCbadge Bootstrap by the Bootstrap Community is licensed under a Creative Commons 4.0 Unported License. This license does not grant permission to run training or professional development. Offering training or professional development with materials substantially derived from Bootstrap must be approved in writing by a Bootstrap Director. Permissions beyond the scope of this license, such as to run training, may be available by contacting