## OERL: Definitions of Terms

Alpha level
The level of probability that stakeholders are willing to live with that a particular statistical test on a particular set of cases is committing a Type I error. The most common alpha level used in evaluations is 5%.

Alpha
The probability that a particular statistical test on a particular set of cases is committing a Type I error and falsely declaring an effect that does not really exist. In graphical terms, the alpha is the percent of the estimated normal curve of the population that falls outside of the confidence interval on either one or both sides of the curve (depending on the nature of the hypothesis being tested).

Audiences
People who are interested in the results of the evaluation. This usually includes the funding agency and the sponsoring organization, but it may also include other groups, such as the parents of participating students or interested researchers.

Bias
When the evaluation design fails to capture the true population and implementation characteristics, thus rendering the results un-generalizable.

Case
One individual in a group of people or other phenomena being studied (such as, in education, classes or schools.)

Categorical variable
A variable with values that are simply categories and cannot be quantified except by counting up how many cases are in each category that contains them (such as names of countries or a series of social classes). Otherwise known as a qualitative variable.

Conclusions
Interpretations that have been synthesized in order to extrapolate even broader meanings about the project from the data (e.g., the low test scores, examined in conjunction with low student retention rates and low motivation from the survey data, suggest that the project is not meeting its objective of providing an interesting curriculum).

Confidence interval
The interval surrounding the mean of the sample that has a specified confidence of containing the mean of the population.

Constructed response
A type of question that requires the respondent to compose an answer rather than select from a list of choices (e.g., selected response).

Continuous variable
An ordinal variable with values that can be broken down into ever-more granular numeric units (example: age, height, and weight.)

Criterion-Referenced
A scoring interpretation in which a test score is defined by whether a pre-specified level of accomplishment has been met.

Dichotomous variable
A categorical variable that is limited to two values that are not necessarily opposites (such as yes/no, low/high, low/very low, agree/disagree, disagree/highly disagree.)

Discrete variable
An ordinal variable with values that consist of countable finite integers that cannot broken down into more granular units.

Effect size
The amount of effect that is desired in order to support the idea that the intervention is successful.

Effect
An outcome that can be said to be at least partially the result of an intervention rather than caused by other intervening factors.

Efficiency
When your sampling scheme maximizes your power (by generating large samples at the primary unit of analysis) while not needlessly over-sampling at secondary units of analysis.

Errors of Measurement
Sources of variability that interfere with an accurate test score and influence test results in unexpected ways.
Common sources include:

• characteristics of test takers
• characteristics and behavior of the test administrator
• characteristics of the testing environment
• scoring accuracy

Formative Evaluation
Evaluation which examines the effectiveness of the project's implementation for the sake of facilitating project improvement.

Goal
A broad description of an intended outcome.

Interpretations
Meanings that have been inferred and extrapolated from the data (e.g., the scores were low relative to expectations).

Learning Assessment
A systematic measurement tool for capturing some aspect of learning.

Nominal scale
An arrangement of values of a categorical variable that has no meaningful order (such as hair color or occupation).

Numeric variable
A variable with numeric values and a natural order. Otherwise Also known as a quantitative variable.

Norm-Referenced
A scoring interpretation in which a test score is defined according to how others perform on the same test.

Objective
A more specific description of an intended outcome. Objectives are usually stated in ways that allow the amount of attainment to be measured. In education, objectives are typically about cognition, affect, or psychomotor skill.

Observation
The particular value assigned to a case on a particular variable.

Ordinal scale
An order that can be imposed on the values of a variable in a subject, where the order ranges from the highest value (such as "very interested") to the lowest value (such as "not at all interested").

Participants
Stakeholders who are engaged in project activities. For example, in a project that involves implementing a new curriculum, the participants might be the instructors teaching the new curriculum and the students receiving it.

Power
The probability that your statistical tests will detect an effect that really exists (e.g., the probability that the test not commit a Type II error).

Primary unit of analysis
The broadest unit of analysis that you sample from and analyze distinctively due to the fact that it has distinguishing attributes that could influence intervention outcomes.

Qualitative Data
Non-quantified narrative information.

Qualitative Analysis
The use of systematic procedures for deriving meaning from qualitative information. It often involves an inductive, interactive, and iterative process whereby the evaluator returns to relevant audiences and data sources to confirm and/or expand the purposes of the evaluation and test conclusions.

Can be conducted on data collected using interviews, observations, and open-ended questions on content assessments, as well as on other types of instruments. Content, thematic, and cognitive analyses are some of the approaches that are used to analyze qualitative data.

Quantitative Data
Quantifiable, numerically-expressed information.

Quantitative Analysis
The use of computational procedures and statistical tests to examine quantitative data.

Ratio scale
T he ordering of numeric values when zero is meaningful (such as money or weight.)

Reliability
"The extent to which we are measuring some attribute in a systematic and therefore repeatable way" (Walsh & Betz, 1995, p. 49). For an instrument to be reliable its results must be reproducible and stable under the different conditions in which it is likely to be used. Test reliability is decreased by errors of measurement. Three commonly used types of reliability include:

• test-retest reliability: the degree to which a score on one instrument is equivalent to the score on the same or a parallel instrument
• internal consistency reliability: the degree to which items within an instrument correlate to each other
• inter-rater reliability: the degree to which the measuring instrument yields similar results at the same time with more than one assessor

Results
Relevant information gleaned from the data collected in the evaluation.

Scale
The order of values of a variable.

Secondary unit of analysis
Subgroups of your primary unit of analysis that you sample from.

Selected response
A type of question that requires the respondent to select an answer from a list of choices rather than compose an answer (e.g., constructed response).

Stakeholders
Individuals who have a stake or interest in a project, including the:

• funder of the project (e.g., NSF)
• sponsoring organization that hosts the project (e.g., a university or research and development institution) and typically hires the evaluator
• internal administrators of the project
• participants in the project
• project audiences

Summative Evaluation
Evaluation which examines the project's impact in order to make a decision about its overall effectiveness.

Type I error
When a statistical test falsely detects an effect that does not really exist.

Type II error
When a statistical test fails to detect an effect that really exists.

Validity
"The extent to which the test being used actually measures the characteristic or dimension we intend to measure" (Walsh & Betz, 1995, p. 58). Three traditional conceptions of validity are:

• content validity: the degree to which a test content is tied to the instructional domain it intends to measure
• criterion validity: the degree to which a test predicts some criterion
• construct validity: the degree to which a test measures the theoretical construct it intends to measure

Recent thinking views validity as depending on both:

• evidential basis: the interpretability, relevance, and utility of test scores
• consequential basis: the value implications of test scores as a basis of action and the social consequences of using these scores

Value
The attribute of a case on a particular variable.

Variable
An attribute of something being studied or observed that can be assigned a value.