ReviewYou’ve learned a lot in this class about how to analyze data. What questions matter to you?
Come up with a question that you want answered about the world around you.
Using what you know now, what information would you need to collect in order to answer it?
What subsets would you need to create? What analysis would you need to perform?
Debrief as a class.
Threats to Validity
Overview
Learning Objectives
Students learn about threats to validity, such as sample size, selection bias, sample error, and confounding variables.
Evidence Statementes
Product Outcomes
Materials
Preparation
Threats to Validity(Time 20 minutes)
Threats to Validity
Survey says: "People prefer cats to dogs"
As good Data Scientists, the staff at the animal shelter is constantly gathering data about their animals, their volunteers, and the people who come to visit. But just because they have data doesn’t mean the conclusions they draw from it are correct! For example: suppose they surveyed 1,000 cat-owners and found that 95% of them thought cats were the best pet. Could they really claim that people generally prefer cats to dogs?
Have students share back what they think. The issue here is that cat-owners are not a representative sample of the population, so the claim is invalid.
There’s more to data analysis than simply collecting data and crunching numbers. In the example of the cat-owning survey, the claim that "people prefer cats to dogs" is invalid because the data itself wasn’t representative of the whole population (of course cat-owners are partial to cats!). This is just one example of what are called Threats to Validity.
On Page 54 and Page 55, you’ll find four different claims backed by four different datasets. Each one of those claims suffers from a serious threat to validity. Can you figure out what those threats are?
Give students time to discuss and share back. Answers: The dog-park survey is not a random sample, the dogs are friendlier towards whomever is giving them food, etc.
Life is messy, and there are always threats to validity. Data Science is about doing the best you can to minimize those threats, and to be up front about what they are whenever you publish a finding. When you do your own analysis, make sure you include a discussion of the threats to validity!
On Page 56, you’ll find some deliberately misleading claims made by slimy Data Scientists. Can you figure out why these claims should not be trusted? Once you’ve finished, consider your own dataset and analysis: what misleading claims could someone make about your work? Turn to Page 57, and come up with four misleading claims based on data or displays from your work. Then trade papers with another group, and see if you can figure out why each other’s claims are not to be trusted!
Your research paper
Overview
Learning Objectives
Evidence Statementes
Product Outcomes
Materials
Preparation
Your research paper(Time flexible)
Your research paperNow that you’ve completed your analysis, it’s time to write up your findings!
Open the Research Paper template, and save a copy to your Google Drive.
Each section of the research paper refers back to the work you’ve done in the Student Workbook. Use these pages and your program to write your findings!
Bootstrap:Data Science by Emmanuel Schanzer, Nancy Pfenning, Emma Youndtsmith, Jennifer Poole, Shriram Krishnamurthi, Joe Politz and Ben Lerner was developed partly through support of the National Science Foundation, (awards 1535276, 1647486, and 1738598), and is licensed under a Creative Commons 4.0 Unported License. Based on a work at www.BootstrapWorld.org. Permissions beyond the scope of this license may be available by contacting schanzer@BootstrapWorld.org.