Step 4: Decide what, if any, comparison outcome data will be collected (P, R).
|
|
(P) = plan example
(R) = report example
|
|
If the intervention is supposed to solve a problem better than existing alternatives,
you need a way to demonstrate that it in fact does (or does not). The obvious way to do
so is to compare your intervention with those alternatives so that if you observe what
appears to be an effect, you can more plausibly argue that it is a true effect and not
really the result of some other factors associated with the setting.
The concepts of internal validity and external validity are important to discussions of
comparative evaluation. Having internal validity means that the findings truly measure the
effect of the intervention rather than the effects of other variables that might affect
the outcome. Having external validity means that the findings are generalizable to the
entire population of individuals for whom the intervention is intended. To achieve internal
and external validity, the evaluation must compare outcomes gathered in intervention and
nonintervention contexts and include a sample of schools, teachers and students that are
representative of the populations that the district serves.
There are different evaluation designs that make possible such comparisons. In
choosing of which design to use, your underlying goal is to identify the best design for
assessing the true impact that the program might have while at the same time ruling out
alternative explanations for the results of your evaluation. To do this, use a design
that will allow you to simulate what would have happened to members of the intervention
group if they had not received the intervention or received an alternative intervention
(also known as the counterfactual condition).
Thus you want to make sure that your design helps you achieve the following:
- Prior to the introduction of the intervention, members of the intervention and
comparison groups are as equivalent as possible in characteristics that might be
associated with the outcome measure. Examples of characteristics are prior skills
and abilities, or demographic characteristics such as age, gender, or family
background.
- Both groups are exposed to the same external influences that might impact
performance on the outcome measure that you are using to provide evidence of whether
the intervention is having the intended impacts. Examples of external influences may be
student or teacher attrition, or changes in district policies that affect the
characteristics of classroom instruction.
If the groups are equivalent at the start of the study, and they are exposed to the
same external influences during the study, you can be more confident that any
difference in the outcomes for these groups was caused by exposure to the intervention
and not by some other factor such as differences between the groups in their prior
academic achievement or changes in district policy that affected one group but not the
other.
Experimental designs
Properly-implemented experimental designs provide the strongest evidence of the
effectiveness of a program. Because experimental designs require that the participants
be randomly assigned to intervention and comparison groups, these designs provide the
highest assurance that the groups are equivalent. Then, when a difference is
observed between outcomes in the two groups, the strongest case can be made the
difference is due to participants exposure to the intervention rather than differences in
the characteristics of the participants that existed prior to the introduction of the
intervention. When, for example, schools, teacher or students with similar characteristics
are assigned purely at random to the intervention and comparison groups, each participant
will have an equal chance of being assigned to either the intervention or comparison group.
If the two groups are large enough, researchers can assume that the different types of schools,
teachers, or students will be evenly distributed among the groups.
As powerful as experimental designs are, the need to randomly assign participants to
groups can create some challenges for evaluators when studying some interventions in some
settings. For example, participation in an intervention may need to be voluntary or,
conversely, all target population members may be required to participate. Also, by the
time the evaluator gets involved, it may be too late for random assignment because the
participants have already been selected or the program is already in place. In school
districts, an experimental design is more likely to be feasible if the district can limit
the intervention to only some of the schools, teachers, or students, while offering an
alternative program to the others. Experimental designs can also be used when the demand
for the intervention is greater than the number of available program slots so that
participation in the program can be assigned based on a lottery.
In educational settings, random assignment of schools, classrooms, or teachers to
intervention and comparison groups is more likely to be feasible than random assignment of
students. Whether or not teacher random assignment to the intervention is possible will often
depend on the type of the intervention being evaluated. For example, if an important
part of the intervention requires teachers within a grade level to use common school
planning time to discuss and share instructional strategies and tips on implementing
the program then it may not be appropriate to implement this type of program in
some but not all classrooms within a given grade level. In this example, school level
random assignment may be the most practical unit of assignment.
One way to make the prospect of random assignment more appealing to schools, teachers,
and parents is to promise that after the initial intervention year or years, the
intervention will be expanded to include all classes and teachers if the program
proves to be effective (i.e., a "delayed intervention" design). In this approach,
the initial "intervention" group can be viewed as the "pilot" group.
Other comparison group designs
In cases where random assignment is not feasible, other comparative designs may be
used to yield evidence of intervention effectiveness, although such evidence will not
be as conclusive as evidence from designs using random assignment. These non-experimental
designs are often referred to as quasi-experiments. Most of these designs use comparison
groups and criteria for group assignment that attempt to make the groups as equivalent as
possible prior to the intervention.
Quasi-experimental studies are most conclusive when
- sound matching techniques are used to select the groups so that they have equivalent
pre-existing characteristics, and
- Statistical controls are used in data analysis to adjust estimates of program
impacts for a pre-measure of the outcome and any other measurable pre-existing differences
that could not be avoiding through matching.
To maximize the match, choose the comparison group participants from the same local
population whenever possible. When comparison group participants are chosen from within
the same school or district as the intervention group it is more likely that the
participants in both groups will be exposed to the same external influences (e.g.,
change in district policy or leadership) during the course of the study, thus ruling out
the possibility that differences in external influences may explain any differences in
post-intervention outcomes. In addition, the selection of comparison group
participants from a local population will increase the chances that the participants in
the intervention and comparisons groups will be similar on important characteristics
(both measured and unmeasured) prior to the introduction of the intervention.
One-group pretest posttest design
An even less conclusive yet commonly used design is known as the "one-group pretest
posttest design." These designs are typically used when a comparison group is not readily
available or evaluation resources are limited. It involves comparing the performances of
intervention participants on a particular outcome (such as a test) before and after an
intervention (and sometimes in the midst of the intervention) without the use of a
nonintervention comparison group. The assumption is that any change that takes
place between the pretest and posttest measure of the outcome can be attributed to
exposure to the intervention. Although the design is capable of measuring changes in
student performance, you can not be certain that any gains in performance on the
outcome measure were caused by the intervention. Gains in student performance might
reflect the rate at which student performance on the outcome was changing before the
intervention was introduced. In addition, other policies and programs in the school or
district that coincided with the introduction of the intervention might have been
responsible for the improvements in student performance.
An improvement on the one group pretest posttest design is the short interrupted time
series design. This design attempts to take into account that students, classrooms, or
schools may have been on a particular performance trajectory before the arrival of the
intervention. For example, in early elementary school, students’ reading skills increase
dramatically as a result of normative cognitive development. Thus, in an evaluation of
the effect of a year-long reading intervention during these years, the evaluators must
account for the fact that reading skills would likely improve even without the
intervention. The short interrupted time series design does this by comparing trends in
performance on the outcome measure before and after the introduction of the intervention.
To measure these trends requires the availability of data on the outcome measure for
several semesters or years prior to and after the introduction of the intervention.
These pre-intervention trends are then compared to the trends in performance on the
outcomes following the introduction of the intervention to assess whether any change in
trends occurred concurrently with the introduction of the intervention, thus suggesting
the intervention may have caused this change. Like all one group designs, this one still
cannot rule out the possibility that some other factor that coincided with the
introduction of the intervention may be responsible for the any difference in trends
measured.
Unfortunately, all designs that rely on some type of matching technique to select
comparison group participants other than random assignment are limited in their ability to
provide conclusive evidence of the true impact of an intervention. This is because
evaluators can never be sure that they have matched the groups on all the important
variables that predict participation in the intervention group and scores on the
outcome variable. This is particularly problematic when participation in an intervention
is voluntary. If participation in the intervention is voluntary, it is highly likely
that the intervention group participants (schools, teachers, or students) are different in
important ways from members of the population who had the same opportunity to participate
in the intervention but did not. As a result, any post-intervention differences in
achievement that are found between the groups may be the result not of the intervention
but of differences between the groups that existed prior to the intervention. For this
reason, evidence for program impacts based on quasi-experimental designs are considered
weaker than evidence based on properly implemented experimental (random assignment)
designs.
Monitoring the implementation of comparison group designs
Once a well matched comparison group design is implemented, it must be carefully
monitored to assure that the intervention and comparison group remain comparable
throughout the study and that members of the comparison group are not exposed to the
intervention. If participants in intervention and control groups leave the study at
different rates and for different reasons, comparability of the two groups that may
have existed prior to the intervention may no longer exist by the end of the study.
Thus attrition out of the study must be carefully monitored in both groups and the
reason for leaving should be documented. In addition, the extent to which
participants in comparison groups are exposed to an intervention should also be
monitored. This is often called "spill over" or "contamination" and is more likely
when comparison groups are selected from within the same school or other setting
where the participants frequently interact The existence of intervention spill-over into
the control group will potentially weaken the study’s ability to find effects
(positive or negative) of the intervention.
Collecting data on implementation and classroom instruction in intervention and
comparison groups.
To assess the extent to which the instructional experiences of participants in the
intervention and comparison groups truly differed, the evaluators should collect
comparable data about implementation conditions in intervention and comparison groups.
Examples of such data include, the curriculum used, the professional development
offered to teachers, the amount of instructional time in the subject area, the
prevalence of certain instructional practices and strategies that are the focus of
the intervention, and the presence of other instructional reforms that may be taking
place in both groups and that may impact performance on the outcomes. These comparative
data will allow evaluators to do a better job of interpreting the results of the impact
analyses.
|