- argument
-
the inputs to a function; the expressions for each argument follow the function name
- average
-
a representation of the center, or 'typical' value in a set of numbers, calculated as the sum of those numbers divided by the number of values.
- bar chart
-
a display of categorical data that uses bars positioned over category values; each bar’s height reflects the count or percentage of data values in that category
- bias
-
prejudice in favor of or against one outcome, person, or group compared with another, usually in a way considered to be unfair.
- bin
-
a range that values from a dataset can belong to; there is one bar in a histogram per bin
- Boolean
-
a type of data with two values: true and false
- box plot
-
the box plot (a.k.a. box-and whisker-plot) is a way of displaying a distribution of data based on the five-number summary: minimum, first quartile, median, third quartile, and maximum
- categorical data
-
data whose values are qualities that are not subject to the laws of arithmetic
- comments
-
messages in the code, generally ignored by the computer, to help people interacting with the code understand what it is doing
- conditional
-
a code expression made of questions and answers
- contract
-
a statement of the name, domain, and range of a function
- contract error
-
errors where the code makes sense, but uses a function with the wrong number or type of arguments
- correlation
-
a single number somewhere between -1 and +1 that reports the direction and strength of the linear relationship between two quantitative variables (also known as the r-value)
- data row
-
a structured piece of data in a dataset that typically reports all the information gathered about a given individual
- data science
-
the science of collecting, organizing, and drawing general conclusions from data, with the help of computers
- data type
-
a way of classifying values, such as: Number, String, Image, Boolean, or any user-defined data structure
- dataset
-
a collection of related information that is composed of separate elements, but can be manipulated as a unit by a computer
- define
-
to associate a descriptive name with a value
- definitions area
-
the left-most text box in the Editor where definitions for values and functions are written
- direction
-
the relationship between two quantitative variables: either they increase/decrease together or one may increase while the other decreases
- distribution
-
a description of the number of times or relative probabilities that different quantities occur in a sample
- domain
-
the type or set of inputs that a function expects
- editor
-
software in which code can be written and evaluated
- error message
-
information from the computer about errors in code
- example
-
shows the use of a function on specific inputs and the computation the function should perform on those inputs
- explanatory variable
-
any variable that could impact the "response variable", generally plotted on the x-axis of a scatter plot
- form
-
the shape of a relationship between two quantitative variables: whether the two variables together vary linearly or in some other way
- frequency
-
how often a particular value appears in a dataset
- function
-
a relation from a set of inputs to a set of possible outputs, where each input is related to exactly one output
- function definition
-
code that names a function, lists its variables, and states the expression to compute when the function is used
- grouped sample
-
a non-random subset of individuals chosen from a larger set, where the individuals belong to a specific group
- header
-
the titles of each column of a table, usually shown at the top
- histogram
-
a display of quantitative data that uses vertical bars positioned over bins (sub-intervals); each bar’s height reflects the count or percentage of data values in that bin.
- identifier column
-
a column of unique values which identify all the individual rows (e.g. - student IDs, SSNs, etc)
- interactions area
-
the right-most text box in the Editor, where expressions are entered to be evaluated
- interquartile range
-
(IQR) is one possible measure of spread, based on dividing a dataset into four parts. The values that divide each part are called the first quartile (Q1), the median, and third quartile (Q3). IQR is calculated as Q3 minus Q1.
- line of best fit
-
summarizes the relationship (if linear) between two quantitative variables
- linear regression
-
a type of analysis that models the relationship between two quantitative variables. The result is known as a regression line, or line of best fit.
- linear relationship
-
sequences that change at a constant rate, or points forming a straight line on a graph
- maximum
-
the largest value in a dataset
- mean
-
a representation of the center, or 'typical' value in a set of numbers, calculated as the sum of those numbers divided by the number of values.
- median
-
the middle element of a quantitative dataset
- method
-
a function that is only associated with an instance of a data type, which consumes inputs and produces an output based on that instance
- minimum
-
the smallest value in a dataset
- mode
-
the most commonly appearing categorical or quantitative value or values in a dataset
- name
-
how we refer to a function or value defined in a language (examples: +, *, star, circle)
- null hypothesis
-
the hypothesis that there is no significant difference between specified populations, any observed difference being due to sampling or experimental error.
- operator
-
a symbol that manipulates two Numbers and produces a result
- outlier
-
observations whose values are very different from the other observations in the same dataset, perhaps due to experimental error. Outliers can also be indicative of data belonging to a different population from the rest of the established samples.
- percentage
-
a ratio showing the parts per hundred
- pie chart
-
a display that uses areas of a circular pie’s slices to show percentages in each category
- piecewise function
-
a function that computes different expressions based on its input
- predictor function
-
a function which, given a value from one dataset, makes an educated guess at a related value in a different dataset
- programming language
-
a set of rules for writing code that a computer can evaluate
- purpose statement
-
a concise, detailed description of what a function does with its inputs
- quantitative data
-
number values for which arithmetic makes sense
- quartile
-
each of four equal groups into which a population can be divided according to the distribution of values of a particular variable.
- r
-
a number between −1 and 1 that measures the direction and strength of a linear relationship between two quantitative variables (also known as correlation value)
- random sample
-
a subset of individuals chosen from a larger set, such that each individual has the same probability of being chosen
- range
-
the type or set of outputs that a function produces
- range of a dataset
-
the distance between minimum and maximum values
- ratio
-
the relative sizes of two or more values
- response variable
-
the variable in a relationship that is presumed to be affected by the explanatory variable, generally plotted on the y-axis of a scatter plot
- sample
-
a set of individuals or objects collected or selected from a statistical population by a defined procedure
- sample size
-
the number of participants or observations included in a study
- scatter plot
-
a display of the relationship between two quantitative variables, graphing each explanatory value on the x axis and the accompanying response on the y axis
- shape
-
The aspect of a dataset - visible in a histogram or box plot - that describes which values are more or less common.
- skew
-
lack of balance in a dataset’s shape, arising from more values that are unusually low or high. Such values tend to trail off, rather than be separated by a gap (as with outliers).
- skewed left
-
A distribution is skewed left if there are a few values that are fairly low compared to the others. A histogram of data that is skewed left will have a clump of taller bars on the right, with smaller ones trailing off to the left, like the shape of the toes on a left foot.
- skewed right
-
A distribution is skewed right if there are a few values that are fairly high compared to the bulk of data values. A histogram of data that is skewed right will have a clump of taller bars on the left, with smaller ones trailing off to the right, like the shape of the toes on a right foot.
- slope
-
the steepness of a straight line on a graph
- spread
-
the extent to which values in a dataset vary, either from one another or from the center
- standard deviation
-
a number that measures spread of a dataset using the typical distance of values from their mean
- statistical inference
-
using information from a sample to draw conclusions about the larger population from which the sample was taken
- strength
-
of a relationship between two quantitative variables: how much the value of one variable tells us about the value of the other
- symmetric
-
A symmetric distribution has a balanced shape, showing that it’s just as likely for the variable to take lower values as higher values.
- syntax error
-
errors where the computer cannot make sense of the code (e.g. - missing commas, missing parentheses, unclosed strings)
- threats to validity
-
factors that can undermine the conclusion of a study
- value
-
a specific piece of data, like 5 or "hello"
- variable
-
a name or symbol that stands for some value or expression, often a value or expression that changes
- y-intercept
-
the point where a line or curve crosses the y-axis of a graph