Note: We also provide a bilingual glossary, which defines all vocabulary words across our lessons in English and Spanish.
- Boolean
a type of data with two values: true and false
- S
the standard deviation of the residuals calculates how far a model's regression line is from all of the data points
- asymptote
A straight line to which a curve gets closer and closer - but never touches - as one of the variables approaches infinity.
- axis of symmetry
a line on either side of which a line, curve, or shape is the reflection of the other
- bar chart
a display of categorical data that uses bars positioned over category values; each bar's height reflects the count or percentage of data values in that category
- base
In expressions like 93 and 72, the 9 and 7 are called bases. They are the factors that will be multiplied by themselves the number of times indicated by the exponent.
- bias
prejudice in favor of or against one outcome, person, or group compared with another, usually in a way considered to be unfair.
- bin
a range that values from a dataset can belong to; there is one bar in a histogram per bin
- box plot
the box plot (a.k.a. box-and whisker-plot) is a way of displaying a distribution of data based on the five-number summary: minimum, first quartile, median, third quartile, and maximum
- categorical data
data whose values are qualities that are not subject to the laws of arithmetic
- circle of evaluation
a diagram of the structure of a mathematical expression (Bootstrap-specific)
- comments
messages in the code, generally ignored by the computer, to help people interacting with the code understand what it is doing
- conditional
an expression containing a list of boolean-producing "questions" (known as conditions), along with code to run if each condition is met
- confounding variable
an unaccounted-for variable that influences both of the variables being analyzed and confuses the interpretation of the relationship between them
- continuous
"Uninterrupted". On a graph, continuous functions appear as an unbroken curve. Some functions are continuous everywhere, while others may only be continuous over a specific interval.
- contract
a statement of the name, domain, and range of a function (Bootstrap-specific)
- contract error
errors where the code makes sense, but uses a function with the wrong number or type of arguments (Bootstrap-specific)
- convenience sample
a method of collecting data whereby the individuals in the sample are selected not because they are most representative of the entire population, but because they are most easily accessible
- correlation
the degree to which knowing the value of an explanatory variable helps us predict the value of another, response, variable
- data
pieces of information about a group of individuals or things
- data row
a structured piece of data in a dataset that typically reports all the information gathered about a given individual
- data science
the science of collecting, organizing, and drawing general conclusions from data, with the help of computers
- data type
a way of classifying values, such as: Number, String, Image, Boolean, or any user-defined data structure
- define
to associate a descriptive name with a value
- definitions area
the left-most text box in the Editor where definitions for values and functions are written (Bootstrap-specific)
- dependent variable
When modeling a relationship between an input and an output (e.g. - distance over time), we are curious about how a change in the input (typically graphed on the x-axis) impacts the output (y). When the output is entirely dependent on the input, we refer to the output as the "dependent variable".
- direction
the aspect of a linear relationship that tells if the line relating the two variables is sloping up or down
- domain
the type or set of inputs a function expects, i.e., the independent variable(s) that govern the output of the function
- editor
software in which code can be written and evaluated
- example
shows the use of a function on specific inputs and the computation the function should perform on those inputs
- explanatory variable
When modeling a possible relationship between an input and an output (e.g. - height and age), we are curious about how a change in the input (typically graphed on the x-axis of a scatter plot) might "explain" the output (y). When the behavior of the output may be explained by the input, we refer to the input as the "explanatory variable".
- exponential decay
A sequence in which each number is multiplied by a constant amount - less than one - to produce the next, causing the sequence to decay rapidly at first and then slow to smaller and smaller reductions
- exponential growth
A sequence in which each number is multiplied by a constant amount - greater than one - to produce the next, causing the sequence to grow slowly at first and then switch to rapidly-accelerating increases
- exponential relationship
A mathematical relation between two variables x and y, in which the dominant term is raised to the power of x, and the y-values grow by a constant factor over equal intervals in the x-values. When graphed, an exponential relationship appears as a 'hockey stick' curve (sloping up or down). Exponential functions occur widely in the natural and social sciences, as in a self-reproducing population or a fund accruing compound interest.
- form
the shape of a relationship between two quantitative variables: whether the two variables together vary linearly or in some other way
- frequency
how often a particular value appears in a dataset
- function
a relation from a set of inputs to a set of possible outputs, where each input is related to exactly one output
- function definition
code that names a function, lists its variables, and states the expression to compute when the function is used
- grouped sample
a non-random subset of individuals chosen from a larger set, where the individuals belong to a specific group
- growth factor
the amount each term in an exponential sequence is multiplied by to get the next term (either 1 plus the growth rate or 1 minus the decay rate)
- header row
the titles of each column of a table, usually shown at the top
- histogram
a display of quantitative data that uses vertical bars positioned over bins (or 'intervals'); each bar's height reflects the count of values in that bin.
- identifier column
a column of unique values which identify all the individual rows (e.g. - student IDs, SSNs, etc)
- independent variable
When modeling a relationship between an input and an output (e.g. - distance over time), we are curious about how a change in the input (typically graphed on the x-axis) impacts the output (y). When the output is entirely dependent on the input, we refer to the input as the "independent variable".
- interactions area
the right-most text box in the Editor, where expressions are entered to be evaluated (Bootstrap-specific)
- interquartile range
(IQR) is the range of the middle 50% of the data. This measure of spread is found by dividing a dataset into four parts. The values that divide each part are called the first quartile (Q1), the median, and third quartile (Q3). IQR is calculated as Q3 minus Q1.
- line of best fit
summarizes the relationship (if linear) between two quantitative variables in such a way as to minimize the errors overall when using explanatory values to predict responses
- linear regression
a type of analysis that models the relationship between two quantitative variables. The result is known as a regression line, or line of best fit.
- linear relationship
a mathematical relation between two quantitative variables x and y such that y changes by a constant amount (the slope) for every unit increase in x. When graphed, a linear relationship appears as a straight line (sloping up or down).
- logarithmic relationship
A mathematical relation between two variables, x and y, in which a base must be raised to the power of y in order to equal the value of x. When graphed, a logarithmic relationship appears as a reflection of an exponential function across the diagonal line y = x.
- maxima
the largest value(s) in a selected range of a set (e.g. - the outputs of a function or a list of values)
- mean
a representation of the center, or 'typical' value in a set of numbers, calculated as the sum of those numbers divided by the number of values.
- median
the middle value (or average of the two middle values) in an ordered list of quantitative data
- minima
the smallest value(s) in a selected range of a set (e.g. - the outputs of a function or a list of values)
- mode
the most commonly appearing categorical or quantitative value or values in a dataset
- model
a simplified representation of variables in a dataset, used to predict the value of one of those variables based on one or more of the other variables in the dataset
- name
how we refer to a function or value defined in a language (examples: +, *, star, circle)
- null hypothesis
The claim that "there's nothing special going on" in the larger population from which we are sampling. This claim could mean "in general, the coin isn't biased towards heads or tails" or "in general, breed doesn't affect how long it takes for dogs to be adopted."
- operator
a symbol that manipulates two Numbers and produces a result
- outlier
observations whose values are very different from the other observations in the same dataset, perhaps due to experimental error. Outliers can also be indicative of data belonging to a different population from the rest of the established samples.
- parabola
A symmetric, U-shaped curve drawn by a quadratic relationship, or by intersecting a plane with a circular cone.
- percentage
a ratio showing the parts per hundred
- pie chart
a display that uses areas of a circular pie's slices to show percentages in each category
- piecewise function
in contrast to conventional functions that have a single "rule" used for every input in their domain, these functions divide their domain into "pieces" and use a different rule for each piece.
- predictor function
a model of patterns in data. These models reduce the complexity of the data, predicting the value of the response variable as if completely dependent on the explanatory (independent) variable.
- programming language
a set of rules for writing code that a computer can evaluate
- purpose statement
a concise, detailed description of what a function does with its inputs
- quadratic relationship
A mathematical relation between two variables, x and y, in which the dominant term is squared. When graphed, a quadratic relationship appears as a parabola
- quantitative data
number values for which arithmetic makes sense
- quartile
one of 3 data points that divide the data set into four equal groups. For example, the data point that separates the 1st quarter of data from the 2nd quarter of data is called Q1, the first quartile or the lower quartile.
- r
a number between −1 and 1 that measures the direction and strength of a linear relationship between two quantitative variables (also known as correlation value)
- random sample
a subset of individuals chosen from a larger set, such that each individual has an equal probability of being chosen
- range
the type or set of outputs that a function produces, i.e., the dependent variable(s)
- range of a dataset
the distance between minimum and maximum values (not to be confused with the range of a function!)
- ratio
the relative sizes of two or more values
- residual
when fitting a model to data, the residual is the difference between the predicted value and the actual value of the response variable. The predicted value comes from the regression line and the actual value comes from the dataset.
- response variable
the variable in a relationship, generally plotted on the y-axis of a scatter plot, that is presumed to be affected by the explanatory variable; in some contexts the response variable is referred to as the "dependent variable" or the "output"
- sample
a set of individuals or objects collected or selected from a statistical population by a defined procedure
- sample size
the number of individuals (people or things) for which data is gathered in a study
- scatter plot
a display of the relationship between two quantitative variables, graphing each explanatory value on the x axis and the accompanying response on the y axis
- selection bias
failure to ensure that the sample obtained will be representative of the population intended to be studied
- shape
The aspect of a dataset - visible in a histogram or box plot - that describes which values are more or less common.
- skew
lack of balance in a dataset's shape, arising from more values that are unusually low or high. Such values tend to trail off, rather than be separated by a gap (as with outliers).
- skewed left
A distribution is skewed left if there are a few values that are fairly low compared to the others. A histogram of data that is skewed left will have a clump of taller bars on the right, with smaller ones trailing off to the left, like the shape of the toes on a left foot.
- skewed right
A distribution is skewed right if there are a few values that are fairly high compared to the bulk of data values. A histogram of data that is skewed right will have a clump of taller bars on the left, with smaller ones trailing off to the right, like the shape of the toes on a right foot.
- slope
the steepness of a straight line on a graph reported as a number which tells how much y changes for every unit increase in x
- spread
the aspect of a dataset that describes the extent to which values vary, either from one another or from the center
- standard deviation
a number that measures spread of a dataset using the typical distance of values from their mean
- statistical inference
using information from a sample to draw conclusions about the larger population from which the sample was taken
- strength
of a relationship between two quantitative variables: how much do the values of one variable tells us about the values of the other
- symmetric
A symmetric distribution has a balanced shape, showing that it's just as likely for the variable to take lower values as higher values.
- syntax error
errors where the computer cannot make sense of the code (e.g. - missing commas, missing parentheses, unclosed strings)
- testing table
a carefully-selected sample from a dataset, which is designed to test a particular table operation for correctness (Bootstrap-specific)
- threats to validity
factors that can undermine the conclusion of a study
- validity
the degree to which a study both measures what it is supposed to and then draws conclusions from the data gathered that are reasonable and supported by the data
- value
a specific piece of data, like 5 or "hello"
- variable
a name or symbol that stands for some value or expression, often a value or expression that changes
- vertex
a point where two or more lines or curves meet (in a parabola, the vertex is the maxima or minima)
- vertical shift
the amount that the graph of a function is shifted up (positive) or down (negative)
- x-intercept
the point where a line or curve crosses the x-axis of a graph (also called the 'root' or 'zero' because this is the x-value for which y=0)
- y-intercept
the point where a line or curve crosses the y-axis of a graph