- arguments
-
the inputs to a function; expressions for arguments follow the name of a function
- bar chart
-
a display of categorical data that uses bars positioned over category values; each bar’s height reflects the count or percentage of data values in that category
- box plot
-
the box plot (a.k.a. box-and whisker-plot) is a way of displaying a distribution of data based on the five-number summary: minimum, first quartile, median, third quartile, and maximum
- categorical data
-
data whose values are qualities that are not subject to the laws of arithmetic.
- contract
-
a statement of the name, domain, and range of a function
- data row
-
a structured piece of data in a dataset that typically reports all the information gathered about a given individual
- data science
-
the science of collecting, organizing, and drawing general conclusions from data, with the help of computers
- definitions area
-
the left-most text box in the Editor where definitions for values and functions are written
- design recipe
-
a sequence of steps that helps people document, test, and write functions
- domain
-
the type or set of inputs that a function expects
- editor
-
software in which you can write and evaluate code
- explanatory variable
-
the variable in a relationship that is presumed to impact the other variable
- form
-
of a relationship between two quantitative variables: whether the two variables together vary linearly or in some other way
- frequency
-
how often a particular value appears in a data set
- function
-
a mathematical object that consumes inputs and produces an output
- grouped sample
-
a non-random subset of individuals chosen from a larger set, where the individuals belong to a specific group
- header
-
the titles of each column of a table, usually shown at the top
- histogram
-
a display of quantitative data that uses vertical bars positioned over bins (sub-intervals); each bar’s height reflects the count or percentage of data values in that bin.
- identifier column
-
a column of unique values which identify all the individual rows (e.g. - student IDs, SSNs, etc)
- interactions area
-
the right-most text box in the Editor, where expressions are entered to evaluate
- interquartile range
-
(IQR) is one possible measure of spread, based on dividing a data set into four parts. The values that divide each part are called the first quartile (Q1), the median, and third quartile (Q3). IQR is calculated as Q3 minus Q1.
- line of best fit
-
summarizes the relationship (if linear) between two quantitative variables
- linear regression
-
modeling the relationship between two quantitative variables using a straight line
- mean
-
average, calculated as the sum of values divided by the number of values
- median
-
the middle element of a quantitative data set
- method
-
a function that is only associated with an instance of a datatype, which consumes inputs and produces an output based on that instance
- mode
-
the most commonly appearing categorical or quantitative value or values in a data set
- outlier
-
a data point that is unusually far above or below most of the others
- pie chart
-
a display that uses areas of a circular pie’s slices to show percentages in each category
- predictor function
-
a function which, given a value from one data set, makes an educated guess at a related value in a different data set
- programming language
-
a set of rules for writing code that a computer can evaluate
- quantitative data
-
number values for which arithmetic makes sense
- quartiles
-
three values that divide a data set into four equal-sized groups
- r
-
a number between −1 and 1 that measures the direction and strength of a linear relationship between two quantitative variables (also known as correlation value)
- range
-
the type or set of outputs that a function produces
- range of a data set
-
the distance between minimum and maximum values
- response variable
-
the variable in a relationship that is presumed to be affected by the other variable
- scatter plot
-
a display of the relationship between two quantitative variables, graphing each explanatory value on the x axis and the accompanying response on the y axis
- shape
-
The aspect of a dataset that tells which values are more or less common
- skew
-
lack of balance in a dataset’s shape, arising from more values that are unusually low or high. Such values tend to trail off, rather than be separated by a gap (as with outliers).
- skewed left
-
A distribution is skewed left if there are a few values that are fairly low compared to the bulk of data values. A display of the data will show a longer tail to the left.
- skewed right
-
A distribution is skewed right if there are a few values that are fairly high compared to the bulk of data values. A display of the data will show a longer tail to the right.
- spread
-
the extent to which values in a data set vary, either from one another or from the center
- statistical inference
-
using information from a sample to draw conclusions about the larger population from which the sample was taken
- symmetric
-
A symmetric distribution has a balanced shape, showing that it’s just as likely for the variable to take lower values as higher values.
- threats to validity
-
factors that can undermine the conclusion of a study