- argument
-
the inputs to a function; the expressions for each argument follow the function name
- bar chart
-
a display of categorical data that uses bars positioned over category values; each bar’s height reflects the count or percentage of data values in that category
- Boolean
-
a type of data with two values: true and false
- box plot
-
the box plot (a.k.a. box-and whisker-plot) is a way of displaying a distribution of data based on the five-number summary: minimum, first quartile, median, third quartile, and maximum
- categorical data
-
data whose values are qualities that are not subject to the laws of arithmetic.
- contract
-
a statement of the name, domain, and range of a function
- contract error
-
errors where the code makes sense, but uses a function with the wrong number or type of arguments
- data row
-
a structured piece of data in a dataset that typically reports all the information gathered about a given individual
- data science
-
the science of collecting, organizing, and drawing general conclusions from data, with the help of computers
- data types
-
a way of classifying values, such as: Number, String, Image, Boolean, or any user-defined data structure
- definitions area
-
the left-most text box in the Editor where definitions for values and functions are written
- domain
-
the type or set of inputs that a function expects
- editor
-
software in which code can be written and evaluated
- error message
-
information from the computer about errors in code
- example
-
shows the use of a function on specific inputs and the computation the function should perform on those inputs
- explanatory variable
-
any variable that could impact the "response variable", generally plotted on the x-axis of a scatter plot
- form
-
of a relationship between two quantitative variables: whether the two variables together vary linearly or in some other way
- frequency
-
how often a particular value appears in a dataset
- function
-
a mathematical object that consumes inputs and produces an output
- function definition
-
code that names a function, lists its variables, and states the expression to compute when the function is used
- grouped sample
-
a non-random subset of individuals chosen from a larger set, where the individuals belong to a specific group
- header
-
the titles of each column of a table, usually shown at the top
- histogram
-
a display of quantitative data that uses vertical bars positioned over bins (sub-intervals); each bar’s height reflects the count or percentage of data values in that bin.
- identifier column
-
a column of unique values which identify all the individual rows (e.g. - student IDs, SSNs, etc)
- interactions area
-
the right-most text box in the Editor, where expressions are entered to be evaluated
- interquartile range
-
(IQR) is one possible measure of spread, based on dividing a dataset into four parts. The values that divide each part are called the first quartile (Q1), the median, and third quartile (Q3). IQR is calculated as Q3 minus Q1.
- line of best fit
-
summarizes the relationship (if linear) between two quantitative variables
- linear regression
-
a type of analysis that models the relationship between two quantitative variables. The result is known as a regression line, or line of best fit.
- mean
-
a representation of the center, or 'typical' value in a set of numbers, calculated as the sum of those numbers divided by the number of values.
- median
-
the middle element of a quantitative dataset
- method
-
a function that is only associated with an instance of a data type, which consumes inputs and produces an output based on that instance
- mode
-
the most commonly appearing categorical or quantitative value or values in a dataset
- name
-
how we refer to a function or value defined in a language (examples: +, *, star, circle)
- operator
-
a symbol that manipulates two Numbers and produces a result
- outlier
-
observations whose values are very different from the other observations in the same data set, perhaps due to experimental error. Outliers can also be indicative of data belonging to a different population from the rest of the established samples.
- pie chart
-
a display that uses areas of a circular pie’s slices to show percentages in each category
- predictor function
-
a function which, given a value from one dataset, makes an educated guess at a related value in a different dataset
- programming language
-
a set of rules for writing code that a computer can evaluate
- purpose statement
-
a concise, detailed description of what a function does with its inputs
- quantitative data
-
number values for which arithmetic makes sense
- quartiles
-
three values that divide a dataset into four equal-sized groups
- r
-
a number between −1 and 1 that measures the direction and strength of a linear relationship between two quantitative variables (also known as correlation value)
- range
-
the type or set of outputs that a function produces
- range of a dataset
-
the distance between minimum and maximum values
- response variable
-
the variable in a relationship that is presumed to be affected by the explanatory variable, generally plotted on the y-axis of a scatter plot
- sample
-
a set of individuals or objects collected or selected from a statistical population by a defined procedure
- scatter plot
-
a display of the relationship between two quantitative variables, graphing each explanatory value on the x axis and the accompanying response on the y axis
- shape
-
The aspect of a dataset - visible in a histogram or box plot - that describes which values are more or less common.
- skew
-
lack of balance in a dataset’s shape, arising from more values that are unusually low or high. Such values tend to trail off, rather than be separated by a gap (as with outliers).
- skewed left
-
A distribution is skewed left if there are a few values that are fairly low compared to the others. A histogram of data that is skewed left will have a clump of taller bars on the right, with smaller ones trailing off to the left, like the shape of the toes on a left foot.
- skewed right
-
A distribution is skewed right if there are a few values that are fairly high compared to the bulk of data values. A histogram of data that is skewed right will have a clump of taller bars on the left, with smaller ones trailing off to the right, like the shape of the toes on a right foot.
- spread
-
the extent to which values in a dataset vary, either from one another or from the center
- statistical inference
-
using information from a sample to draw conclusions about the larger population from which the sample was taken
- symmetric
-
A symmetric distribution has a balanced shape, showing that it’s just as likely for the variable to take lower values as higher values.
- syntax
-
the set of rules that defines a language, whether it be spoken, written, or programmed.
- syntax error
-
errors where the computer cannot make sense of the code (e.g. - missing commas, missing parentheses, unclosed strings)
- threats to validity
-
factors that can undermine the conclusion of a study
- variable
-
a letter or symbol that stands in for a value or expression