go home

Select a Professional Development Module:
Key Topics Strategy Scenario Case Study References

Introduction  |  Step 1  |  Step 2  |  Step 3  |  Step 4  |  Step 5  |  Step 6  |  Step 7

Step 4: Decide what, if any, comparison outcome data will be collected (P, R).

(P) = plan example
(R) = report example

If the intervention is supposed to solve a problem better than existing alternatives, you need a way to demonstrate that it in fact does (or does not). The obvious way to do so is to compare your intervention with those alternatives so that if you observe what appears to be an effect, you can more plausibly argue that it is a true effect and not really the result of some other factors associated with the setting.

The concepts of internal validity and external validity are important to discussions of comparative evaluation. Having internal validity means that the findings truly measure the effect of the intervention rather than the effects of other variables that might affect the outcome. Having external validity means that the findings are generalizable to the entire population of individuals for whom the intervention is intended. To achieve internal and external validity, the evaluation must compare outcomes gathered in intervention and nonintervention contexts and include a sample of schools, teachers and students that are representative of the populations that the district serves.

There are different evaluation designs that make possible such comparisons. In choosing of which design to use, your underlying goal is to identify the best design for assessing the true impact that the program might have while at the same time ruling out alternative explanations for the results of your evaluation. To do this, use a design that will allow you to simulate what would have happened to members of the intervention group if they had not received the intervention or received an alternative intervention (also known as the counterfactual condition).

Thus you want to make sure that your design helps you achieve the following:

  • Prior to the introduction of the intervention, members of the intervention and comparison groups are as equivalent as possible in characteristics that might be associated with the outcome measure. Examples of characteristics are prior skills and abilities, or demographic characteristics such as age, gender, or family background.

  • Both groups are exposed to the same external influences that might impact performance on the outcome measure that you are using to provide evidence of whether the intervention is having the intended impacts. Examples of external influences may be student or teacher attrition, or changes in district policies that affect the characteristics of classroom instruction.

If the groups are equivalent at the start of the study, and they are exposed to the same external influences during the study, you can be more confident that any difference in the outcomes for these groups was caused by exposure to the intervention and not by some other factor such as differences between the groups in their prior academic achievement or changes in district policy that affected one group but not the other.

Experimental designs

Properly-implemented experimental designs provide the strongest evidence of the effectiveness of a program. Because experimental designs require that the participants be randomly assigned to intervention and comparison groups, these designs provide the highest assurance that the groups are equivalent. Then, when a difference is observed between outcomes in the two groups, the strongest case can be made the difference is due to participants exposure to the intervention rather than differences in the characteristics of the participants that existed prior to the introduction of the intervention. When, for example, schools, teacher or students with similar characteristics are assigned purely at random to the intervention and comparison groups, each participant will have an equal chance of being assigned to either the intervention or comparison group. If the two groups are large enough, researchers can assume that the different types of schools, teachers, or students will be evenly distributed among the groups.

As powerful as experimental designs are, the need to randomly assign participants to groups can create some challenges for evaluators when studying some interventions in some settings. For example, participation in an intervention may need to be voluntary or, conversely, all target population members may be required to participate. Also, by the time the evaluator gets involved, it may be too late for random assignment because the participants have already been selected or the program is already in place. In school districts, an experimental design is more likely to be feasible if the district can limit the intervention to only some of the schools, teachers, or students, while offering an alternative program to the others. Experimental designs can also be used when the demand for the intervention is greater than the number of available program slots so that participation in the program can be assigned based on a lottery.

In educational settings, random assignment of schools, classrooms, or teachers to intervention and comparison groups is more likely to be feasible than random assignment of students. Whether or not teacher random assignment to the intervention is possible will often depend on the type of the intervention being evaluated. For example, if an important part of the intervention requires teachers within a grade level to use common school planning time to discuss and share instructional strategies and tips on implementing the program then it may not be appropriate to implement this type of program in some but not all classrooms within a given grade level. In this example, school level random assignment may be the most practical unit of assignment.

One way to make the prospect of random assignment more appealing to schools, teachers, and parents is to promise that after the initial intervention year or years, the intervention will be expanded to include all classes and teachers if the program proves to be effective (i.e., a "delayed intervention" design). In this approach, the initial "intervention" group can be viewed as the "pilot" group.

Other comparison group designs

In cases where random assignment is not feasible, other comparative designs may be used to yield evidence of intervention effectiveness, although such evidence will not be as conclusive as evidence from designs using random assignment. These non-experimental designs are often referred to as quasi-experiments. Most of these designs use comparison groups and criteria for group assignment that attempt to make the groups as equivalent as possible prior to the intervention.

Quasi-experimental studies are most conclusive when

  1. sound matching techniques are used to select the groups so that they have equivalent pre-existing characteristics, and

  2. Statistical controls are used in data analysis to adjust estimates of program impacts for a pre-measure of the outcome and any other measurable pre-existing differences that could not be avoiding through matching.

To maximize the match, choose the comparison group participants from the same local population whenever possible. When comparison group participants are chosen from within the same school or district as the intervention group it is more likely that the participants in both groups will be exposed to the same external influences (e.g., change in district policy or leadership) during the course of the study, thus ruling out the possibility that differences in external influences may explain any differences in post-intervention outcomes. In addition, the selection of comparison group participants from a local population will increase the chances that the participants in the intervention and comparisons groups will be similar on important characteristics (both measured and unmeasured) prior to the introduction of the intervention.

One-group pretest posttest design

An even less conclusive yet commonly used design is known as the "one-group pretest posttest design." These designs are typically used when a comparison group is not readily available or evaluation resources are limited. It involves comparing the performances of intervention participants on a particular outcome (such as a test) before and after an intervention (and sometimes in the midst of the intervention) without the use of a nonintervention comparison group. The assumption is that any change that takes place between the pretest and posttest measure of the outcome can be attributed to exposure to the intervention. Although the design is capable of measuring changes in student performance, you can not be certain that any gains in performance on the outcome measure were caused by the intervention. Gains in student performance might reflect the rate at which student performance on the outcome was changing before the intervention was introduced. In addition, other policies and programs in the school or district that coincided with the introduction of the intervention might have been responsible for the improvements in student performance.

An improvement on the one group pretest posttest design is the short interrupted time series design. This design attempts to take into account that students, classrooms, or schools may have been on a particular performance trajectory before the arrival of the intervention. For example, in early elementary school, students’ reading skills increase dramatically as a result of normative cognitive development. Thus, in an evaluation of the effect of a year-long reading intervention during these years, the evaluators must account for the fact that reading skills would likely improve even without the intervention. The short interrupted time series design does this by comparing trends in performance on the outcome measure before and after the introduction of the intervention. To measure these trends requires the availability of data on the outcome measure for several semesters or years prior to and after the introduction of the intervention. These pre-intervention trends are then compared to the trends in performance on the outcomes following the introduction of the intervention to assess whether any change in trends occurred concurrently with the introduction of the intervention, thus suggesting the intervention may have caused this change. Like all one group designs, this one still cannot rule out the possibility that some other factor that coincided with the introduction of the intervention may be responsible for the any difference in trends measured.

Unfortunately, all designs that rely on some type of matching technique to select comparison group participants other than random assignment are limited in their ability to provide conclusive evidence of the true impact of an intervention. This is because evaluators can never be sure that they have matched the groups on all the important variables that predict participation in the intervention group and scores on the outcome variable. This is particularly problematic when participation in an intervention is voluntary. If participation in the intervention is voluntary, it is highly likely that the intervention group participants (schools, teachers, or students) are different in important ways from members of the population who had the same opportunity to participate in the intervention but did not. As a result, any post-intervention differences in achievement that are found between the groups may be the result not of the intervention but of differences between the groups that existed prior to the intervention. For this reason, evidence for program impacts based on quasi-experimental designs are considered weaker than evidence based on properly implemented experimental (random assignment) designs.

Monitoring the implementation of comparison group designs

Once a well matched comparison group design is implemented, it must be carefully monitored to assure that the intervention and comparison group remain comparable throughout the study and that members of the comparison group are not exposed to the intervention. If participants in intervention and control groups leave the study at different rates and for different reasons, comparability of the two groups that may have existed prior to the intervention may no longer exist by the end of the study. Thus attrition out of the study must be carefully monitored in both groups and the reason for leaving should be documented. In addition, the extent to which participants in comparison groups are exposed to an intervention should also be monitored. This is often called "spill over" or "contamination" and is more likely when comparison groups are selected from within the same school or other setting where the participants frequently interact The existence of intervention spill-over into the control group will potentially weaken the study’s ability to find effects (positive or negative) of the intervention.

Collecting data on implementation and classroom instruction in intervention and comparison groups.

To assess the extent to which the instructional experiences of participants in the intervention and comparison groups truly differed, the evaluators should collect comparable data about implementation conditions in intervention and comparison groups. Examples of such data include, the curriculum used, the professional development offered to teachers, the amount of instructional time in the subject area, the prevalence of certain instructional practices and strategies that are the focus of the intervention, and the presence of other instructional reforms that may be taking place in both groups and that may impact performance on the outcomes. These comparative data will allow evaluators to do a better job of interpreting the results of the impact analyses.