Data Science is all about asking questions of data.
Each question a Data Scientist asks adds a chapter to the story of their research. Even if a question is a "dead-end", it’s valuable to share what the question was and what work you did to answer it! |
-
We start by Asking Questions after reviewing and closely observing the data. These questions can come from initial wonderings, or as a result of previous data cycle. Most questions can be broken down into one of four categories:
-
Lookup questions - Answered by only reading the table, no further calculations are necessary! Once you find the value, you’re done! Examples of lookup questions might be “How many legs does Felix have?” or "What species is Sheba?"
-
Arithmetic questions - Answered by doing calculations (comparing, averaging, totaling, etc.) with values from one single column. Examples of arithmetic questions might be “How much does the heaviest animal weigh?” or “What is the average age of animals from the shelter?”
-
Statistical questions - These are questions that both expect some variability in the data related to the question and account for it in the answers. Statistical questions often involve multiple steps to answer, and the answers aren’t black and white. When we compare two statistics we are actually comparing two data sets. If we ask "are dogs heavier than cats?", we know that not every dog is heavier than every cat! We just want to know if it is generally true or generally false!
-
Questions we can’t answer - We might wonder where the animal shelter is located, or what time of year the data was gathered! But the data in the table won’t help us answer that question, so as Data Scientists we might need to do some research beyond the data. And if nothing turns up, we simply recognize that there are limits to what we can analyze.
-
-
Next, we Consider Data, by determining which parts of the data set we need to answer our question. Sometimes we don’t have the data we need, so we conduct a survey, observe and record data, or find another existing dataset. Since our data is contained in a table, it’s useful to start by asking two questions:
-
What rows do we care about? - Is it all the animals? Just the lizards?
-
What columns do we need? - Are we examining the ages of the animals? Their weights?
-
-
Then, we Analyze the Data, by completing calculations, creating data displays, creating new tables, or filtering existing tables. The results of this step are calculations, patterns, and relationships.
-
Are we making a pie chart? A bar chart? Something else?
-
-
Finally, we Interpret the Data, by answering our original question and summarizing the process we took and the results we found. Sometimes the data cycle ends here, but often these interpretations lead to new questions… and the cycle begins again.
These materials were developed partly through support of the National Science Foundation, (awards 1042210, 1535276, 1648684, 1738598, 2031479, and 1501927).
Bootstrap by the Bootstrap Community is licensed under a Creative Commons 4.0 Unported License. This license does not grant permission to run training or professional development. Offering training or professional development with materials substantially derived from Bootstrap must be approved in writing by a Bootstrap Director. Permissions beyond the scope of this license, such as to run training, may be available by contacting contact@BootstrapWorld.org.