The level of probability that stakeholders are willing to live with that a particular statistical test on a particular set of cases is committing a Type I error. The most common alpha level used in evaluations is 5%.
The probability that a particular statistical test on a particular set of cases is committing a Type I error and falsely declaring an effect that does not really exist. In graphical terms, the alpha is the percent of the estimated normal curve of the population that falls outside of the confidence interval on either one or both sides of the curve (depending on the nature of the hypothesis being tested).
People who are interested in the results of the evaluation. This usually includes the funding agency and the sponsoring organization, but it may also include other groups, such as the parents of participating students or interested researchers.
When the evaluation design fails to capture the true population and implementation characteristics, thus rendering the results un-generalizable.
One individual in a group of people or other phenomena being studied (such as, in education, classes or schools.)
A variable with values that are simply categories and cannot be quantified except by counting up how many cases are in each category that contains them (such as names of countries or a series of social classes). Otherwise known as a qualitative variable.
Interpretations that have been synthesized in order to extrapolate even broader meanings about the project from the data (e.g., the low test scores, examined in conjunction with low student retention rates and low motivation from the survey data, suggest that the project is not meeting its objective of providing an interesting curriculum).
The interval surrounding the mean of the sample that has a specified confidence of containing the mean of the population.
A type of question that requires the respondent to compose an answer rather than select from a list of choices (e.g., selected response).
An ordinal variable with values that can be broken down into ever-more granular numeric units (example: age, height, and weight.)
A scoring interpretation in which a test score is defined by whether a pre-specified level of accomplishment has been met.
A categorical variable that is limited to two values that are not necessarily opposites (such as yes/no, low/high, low/very low, agree/disagree, disagree/highly disagree.)
An ordinal variable with values that consist of countable finite integers that cannot broken down into more granular units.
The amount of effect that is desired in order to support the idea that the intervention is successful.
An outcome that can be said to be at least partially the result of an intervention rather than caused by other intervening factors.
When your sampling scheme maximizes your power (by generating large samples at the primary unit of analysis) while not needlessly over-sampling at secondary units of analysis.
Sources of variability that interfere with an accurate test score and influence test results in unexpected ways.
Common sources include:
Evaluation which examines the effectiveness of the project's implementation for the sake of facilitating project improvement.
A broad description of an intended outcome.
Meanings that have been inferred and extrapolated from the data (e.g., the scores were low relative to expectations).
A systematic measurement tool for capturing some aspect of learning.
An arrangement of values of a categorical variable that has no meaningful order (such as hair color or occupation).
A variable with numeric values and a natural order. Otherwise Also known as a quantitative variable.
A scoring interpretation in which a test score is defined according to how others perform on the same test.
A more specific description of an intended outcome. Objectives are usually stated in ways that allow the amount of attainment to be measured. In education, objectives are typically about cognition, affect, or psychomotor skill.
The particular value assigned to a case on a particular variable.
An order that can be imposed on the values of a variable in a subject, where the order ranges from the highest value (such as "very interested") to the lowest value (such as "not at all interested").
Stakeholders who are engaged in project activities. For example, in a project that involves implementing a new curriculum, the participants might be the instructors teaching the new curriculum and the students receiving it.
The probability that your statistical tests will detect an effect that really exists (e.g., the probability that the test not commit a Type II error).
unit of analysis
The broadest unit of analysis that you sample from and analyze distinctively due to the fact that it has distinguishing attributes that could influence intervention outcomes.
Non-quantified narrative information.
The use of systematic procedures for deriving meaning from qualitative information. It often involves an inductive, interactive, and iterative process whereby the evaluator returns to relevant audiences and data sources to confirm and/or expand the purposes of the evaluation and test conclusions.
Can be conducted on data collected using interviews, observations, and open-ended questions on content assessments, as well as on other types of instruments. Content, thematic, and cognitive analyses are some of the approaches that are used to analyze qualitative data.
Quantifiable, numerically-expressed information.
The use of computational procedures and statistical tests to examine quantitative data.
T he ordering of numeric values when zero is meaningful (such as money or weight.)
"The extent to which we are measuring some attribute in a systematic and therefore repeatable way" (Walsh & Betz, 1995, p. 49). For an instrument to be reliable its results must be reproducible and stable under the different conditions in which it is likely to be used. Test reliability is decreased by errors of measurement. Three commonly used types of reliability include:
Relevant information gleaned from the data collected in the evaluation.
The order of values of a variable.
unit of analysis
Subgroups of your primary unit of analysis that you sample from.
A type of question that requires the respondent to select an answer from a list of choices rather than compose an answer (e.g., constructed response).
Individuals who have a stake or interest in a project, including the:
Evaluation which examines the project's impact in order to make a decision about its overall effectiveness.
When a statistical test falsely detects an effect that does not really exist.
When a statistical test fails to detect an effect that really exists.
"The extent to which the test being used actually measures the characteristic or dimension we intend to measure" (Walsh & Betz, 1995, p. 58). Three traditional conceptions of validity are:
Recent thinking views validity as depending on both:
The attribute of a case on a particular variable.
An attribute of something being studied or observed that can be assigned a value.