The Bootstrap Blog

The Four Ingredients of Data Science Education

Data science is now an essential skill, and districts, curriculum providers, and advisory boards are competing to define “Data Science” from a K-12 perspective. Is it primarily about math and statistics? Computer Science? Who should teach it? What questions should a district ask when deciding how to approach teaching it? At Bootstrap — one of the nation’s leading data science programs, and a leader at integrating computing into other disciplines in K-12 — we take a far more holistic view.

A strong and equitable Data Science curriculum combines four ingredients: domains of study, statistics & mathematics, computing, and civic responsibility. Data and questions about them arise in concrete domains such as healthcare, government policy, sports, and scientific research. Statistics & Mathematics provides the analytical tools for answering questions about data. Computing enables transforming, combining, visualizing, and managing data before, during, and after analysis. Civic responsibility helps students understand their roles as both producers and consumers of data and warns about the perils when analysis is done poorly, irresponsibly, or without attention to potential harms.

Most framings for Data Science are driven by math and statistics standards, which almost always fail to address the computational needs of Data Science. Binning data, selecting data subsets, joining columns, and normalizing data are among the many computational tasks that get sequenced and organized using constructs from coding. At a higher level, students need to learn Data Science-centric program design skills. Thus, while the required coding skills may seem modest, math-centric framings that focus only on coding miss the many levels at which computing education matters.

Other framings give short shrift to civic responsibility. Students will be creators, users, and even victims of data. Students should understand how the same data can tell multiple stories, how modern applications use data, and the risks of misusing or over-relying on decisions recommended by machine learning. These issues draw deeply on both statistical and computational concepts, but must be explored in rich contexts that are personally and culturally relevant to students and their communities.

High-quality Computer Science does not require substantial programming in an industrial-strength programming language such as Python or R: many lower floors exist. Ultimately, Computer Science should teach students how to pre-process data in reproducible ways, express and compose data transformations with precision, and leverage computers to test or sanity check their results. These skills can be initially developed using spreadsheets, then refined using small amounts of programming in languages designed to support these tasks. Simple but powerful bits of programming are manageable for non-Computer Science teachers, yet provide an authentic experience for students. Careful choice of programming tools, considering needs of teachers and students, is critical for scaling Data Science education.

Different curricula can use different quantities of the ingredients so long as the dish remains recognizable. This flexibility allows Data Science to take root in a variety of subjects (math, social studies, science, computing, ELA, etc.), which in turn provides flexibility for schools and districts. A data-enhanced module in social studies can reinforce summary statistics and visualizations while discussing civic responsibility. A module for science can emphasize data collection and preparation. A module with stronger computing emphasis can help meet existing commitments around CSforAll or Computer Science standards. Each of these approaches incorporates the four ingredients while simultaneously supporting other curricular goals. Each of Math, Computer Science, and the NGSS have standards that relate to, and can be covered within, Data Science. These standards are useful for justifying inclusion of Data Science content in disciplinary classes. They are also limiting, however, if one frames Data Science only in terms of one discipline’s standards.

Beyond the details in the individual ingredients, Data Science education should leave students with several broader understandings. Students should understand that data are used to summarize the past, report on the present, and predict the future. They should understand that data science asks questions about groups of people or observations rather than individual instances, and is subject to bias due to sampling errors. They should know that the same data can be used to support conflicting narratives, and that data science can provide evidence but not proof. They should know how to support a narrative with data-based evidence, and how to document analyses to let others revisit or audit their claims. These understandings cross-cut disciplines and touch on multiple ingredients, allowing flexibility in how they are taught.

Data Science poses opportunities, pitfalls, and exciting questions for teachers, schools, districts, and states. There are challenges in curricular design, teacher professional development and support, and student learning goals. A growing list of providers are offering Data Science materials, leaving educators to sort out their own priorities within a field that is not yet firmly defined. The four-ingredient model that we propose here is based on our two decades of experience helping teachers figure out how to integrate computing and math instruction in middle- and high-school. It is informed by our perspectives as an interdisciplinary team of computer scientists, math-education specialists, statisticians, and college faculty engaged in equity and diversity efforts. Data Science done right has tremendous potential to address long-standing goals of access to computing education, modernizing statistics and mathematics education, and creating more culturally-relevant math and computing courses; done poorly, it can exacerbate existing problems and create entirely new ones. Achieving this requires a clear vision of what Data Science education can provide to teachers and students.

We hope our four-ingredient model helps educators have this conversation in their respective contexts. Send us an email if you'd like to speak with us. We would love to help you have this conversation in your district.

Posted June 28th, 2021