click below
click below
Normal Size Small Size show me how
Data151- Midterm #1
Study Guide for the first midterm of the Willamette Data 151 class
Term | Definition |
---|---|
Data | observations collected in one way or another |
Datum | a single entry/element/instance of a very large data, the smallest possible part of the largest data you can think of |
Observations | an object/unit described by data |
Variables | any data item in an experiment |
Numeric (Quantitative) | a variable that can take a wide range of numerical values, and it is sensible to add, subtract, or take averages with those values |
Categorical (Qualitive) | types of data which may be divided into groups |
Continuous | data that can take any value |
Discrete | data that can only take whole non-negative numbers |
Ordinal | a categorical variable, but the levels have a natural ordering |
Nominal | a regular categorical variable without any type of special ordering |
Level | the possible values a variable can be |
Binary | data whose unit can take on only two possible states |
Data Matrix (Data Frame) | a convenient and common way to organize data |
Tidy Data | A data frame where each row is a unique case (observational unit), each column is a variable, and each cell is a single value |
Big Data | data that contains greater variety, arriving in increasing volumes and with more velocity |
Volume | the amount of data that exists |
Variety | the diversity of data types |
Velocity | the speed with which data is generated |
Veracity | the quality and accuracy of data |
Population | a distinct group of individuals with shared citizenship |
Parameter | a number that describes something about an entire population |
Sample | an analytic subset of a larger population |
Statistic | a fact or piece of data from a study of a large data size |
Census | the complete enumeration of a population or group at a point in time |
Sampling Unit | the building block of a data set; an individual member of the population, a cluster of members, or some other predefined unit |
Sampling Frame | a list of the items or people forming a population |
Target Population | the group of individuals that the intervention intends to conduct research in and draw conclusions from |
Bias | any sort of influence that may skew the outcome of the data |
Biased Sampling | occurs when some members of a population are systematically more likely to be selected in a sample than others |
Undercoverage Bias | when some groups in the population are left out of the process of choosing the sample |
Nonresponse Bias | when an individual chosen for the sample can't be contacted or refuses to participate |
Response Bias | when an individual does not answer honestly |
Simple Random Sample | most basic random sample; is equivalent to drawing names out of a hat to select cases |
Stratified Random Sample | a divide-and-conquer sampling strategy; the population is divided into groups called strata; the strata are chosen so that similar cases are grouped together, then a second sampling method, usually simple random sampling, is employed within each stratum |
Cluster Sample | break up the population into many groups, called clusters; then we sample a fixed number of clusters and include all observations from each of those clusters in the sample |
Multistage Sample | like a cluster sample, but rather than keeping all observations in each cluster, we would collect a random sample within each selected cluster |
Convenience Sample | whoever wants to answer can; individuals who are easily accessible are more likely to be included in the sample |
Voluntary Sample | researcher puts out a request for members of a population to join the sample, and people decide whether or not to be in the sample |
Anecdotal Evidence | Data collected with a haphazard fashion |
Response Variable | the variable one suspects is affected by the explanatory variable |
Explanatory Variable | the variable whose effect one wants to study |
Observational Study | observes individuals and measures variables of interest but does not attempt to influence |
Experiment | deliberately imposes some treatment on individuals to measure their response |
Confounding Variable | occurs when two variables associate in such a way that there effects on a response variable can't be distinguish from each other |
Experimental Unit | the smallest collection of individuals to which treatments are applied |
Treatment | a specific condition applied to the individuals in an experiment |
Control | may also be referred to as the "no treatment" |
Randomization | researchers randomize participants into treatment groups to account for variables that cannot be controlled |
Replication | being able to repeat an experiment or a part of it under the same or similar conditions |
Blocking | Researchers sometimes know or suspect that variables, other than the treatment, influence the response, they may first group individuals based on this variable into blocks and then randomize cases within each block to the treatment groups. |
Placebo Effect | to give a fake treatment to patients in the control group |
Blinding | only the researcher is aware of which group each participant belongs to |
Double Blinding | where doctors or researchers who interact with patients are, just like the patients, unaware of who is or is not receiving the treatment |
Completely Randomized Design | where the treatments are assigned completely at random so that each experimental unit has the same chance of receiving any one treatment |
Randomized Complete Block Design | each block size is the same and is equal to the number of treatments |
Matched Pairs Design | an experimental design where participants are matched in pairs based on shared characteristics before they are assigned to groups |
Console | a computer terminal where a user may input commands and view output such as the results |
Script | a series of Analytics commands that are executed sequentially and used to automate work within Analytics |
Global Environment | consists of multifaceted factors that affect a business's operation, and the business has no control over them |
Variable Environment | a user-definable value that can affect the way running processes will behave on a computer |
Vectors | a tuple of one or more values called scalars |
Permutation | the number of ways to arrange items/objects given in a list taken, some or all at a time, in a specific order |