Glossary

email twitter instagram facebook

argument: the inputs to a function; the expressions for each argument follow the function name
bar chart: a display of categorical data that uses bars positioned over category values; each bar’s height reflects the count or percentage of data values in that category
Boolean: a type of data with two values: true and false
box plot: the box plot (a.k.a. box-and whisker-plot) is a way of displaying a distribution of data based on the five-number summary: minimum, first quartile, median, third quartile, and maximum
categorical data: data whose values are qualities that are not subject to the laws of arithmetic.
contract: a statement of the name, domain, and range of a function
contract error: errors where the code makes sense, but uses a function with the wrong number or type of arguments
data row: a structured piece of data in a dataset that typically reports all the information gathered about a given individual
data science: the science of collecting, organizing, and drawing general conclusions from data, with the help of computers
data types: a way of classifying values, such as: Number, String, Image, Boolean, or any user-defined data structure
definitions area: the left-most text box in the Editor where definitions for values and functions are written
domain: the type or set of inputs that a function expects
editor: software in which code can be written and evaluated
error message: information from the computer about errors in code
example: shows the use of a function on specific inputs and the computation the function should perform on those inputs
explanatory variable: any variable that could impact the "response variable", generally plotted on the x-axis of a scatter plot
form: of a relationship between two quantitative variables: whether the two variables together vary linearly or in some other way
frequency: how often a particular value appears in a dataset
function: a mathematical object that consumes inputs and produces an output
function definition: code that names a function, lists its variables, and states the expression to compute when the function is used
grouped sample: a non-random subset of individuals chosen from a larger set, where the individuals belong to a specific group
header: the titles of each column of a table, usually shown at the top
histogram: a display of quantitative data that uses vertical bars positioned over bins (sub-intervals); each bar’s height reflects the count or percentage of data values in that bin.
identifier column: a column of unique values which identify all the individual rows (e.g. - student IDs, SSNs, etc)
interactions area: the right-most text box in the Editor, where expressions are entered to be evaluated
interquartile range: (IQR) is one possible measure of spread, based on dividing a dataset into four parts. The values that divide each part are called the first quartile (Q1), the median, and third quartile (Q3). IQR is calculated as Q3 minus Q1.
line of best fit: summarizes the relationship (if linear) between two quantitative variables
linear regression: a type of analysis that models the relationship between two quantitative variables. The result is known as a regression line, or line of best fit.
mean: a representation of the center, or 'typical' value in a set of numbers, calculated as the sum of those numbers divided by the number of values.
median: the middle element of a quantitative dataset
method: a function that is only associated with an instance of a data type, which consumes inputs and produces an output based on that instance
mode: the most commonly appearing categorical or quantitative value or values in a dataset
name: how we refer to a function or value defined in a language (examples: +, *, star, circle)
operator: a symbol that manipulates two Numbers and produces a result
outlier: observations whose values are very different from the other observations in the same data set, perhaps due to experimental error. Outliers can also be indicative of data belonging to a different population from the rest of the established samples.
pie chart: a display that uses areas of a circular pie’s slices to show percentages in each category
predictor function: a function which, given a value from one dataset, makes an educated guess at a related value in a different dataset
programming language: a set of rules for writing code that a computer can evaluate
purpose statement: a concise, detailed description of what a function does with its inputs
quantitative data: number values for which arithmetic makes sense
quartiles: three values that divide a dataset into four equal-sized groups
r: a number between −1 and 1 that measures the direction and strength of a linear relationship between two quantitative variables (also known as correlation value)
range: the type or set of outputs that a function produces
range of a dataset: the distance between minimum and maximum values
response variable: the variable in a relationship that is presumed to be affected by the explanatory variable, generally plotted on the y-axis of a scatter plot
sample: a set of individuals or objects collected or selected from a statistical population by a defined procedure
scatter plot: a display of the relationship between two quantitative variables, graphing each explanatory value on the x axis and the accompanying response on the y axis
shape: The aspect of a dataset - visible in a histogram or box plot - that describes which values are more or less common.
skew: lack of balance in a dataset’s shape, arising from more values that are unusually low or high. Such values tend to trail off, rather than be separated by a gap (as with outliers).
skewed left: A distribution is skewed left if there are a few values that are fairly low compared to the others. A histogram of data that is skewed left will have a clump of taller bars on the right, with smaller ones trailing off to the left, like the shape of the toes on a left foot.
skewed right: A distribution is skewed right if there are a few values that are fairly high compared to the bulk of data values. A histogram of data that is skewed right will have a clump of taller bars on the left, with smaller ones trailing off to the right, like the shape of the toes on a right foot.
spread: the extent to which values in a dataset vary, either from one another or from the center
statistical inference: using information from a sample to draw conclusions about the larger population from which the sample was taken
symmetric: A symmetric distribution has a balanced shape, showing that it’s just as likely for the variable to take lower values as higher values.
syntax: the set of rules that defines a language, whether it be spoken, written, or programmed.
syntax error: errors where the computer cannot make sense of the code (e.g. - missing commas, missing parentheses, unclosed strings)
threats to validity: factors that can undermine the conclusion of a study
variable: a letter or symbol that stands in for a value or expression