Step 6: Maximize the efficiency of your sample (P).
|
|
|
Evaluations are ultimately constrained by economic considerations. Given limited time
and money resources, how can one extract the maximum amount of useful information from an
evaluation? One key decision entails choosing an appropriate sample size (see Step 6).
When the subjects of an evaluation study are naturally clustered in larger units (such as
districts, schools, or classrooms), the question of optimal sample size is more
complicated. For example, if an interview budget allows for 100 face-to-face interviews,
is it better to interview 5 students in each of 20 classrooms, 10 students in each of 10
classrooms, or 20 students in each of 5 classrooms? Which of these scenarios is likely to
yield the greatest information (and, consequently, best reduce the probability of Type I
and Type II errors)? The idea of efficiency is that one should try to maximize the
ratio of information gained to total cost.
Evaluations often classify their subjects by dividing them into hierarchical levels, such as
districts, schools, classes, and students. In your sampling procedure, you can select
members in each level in a way that maximizes the sample's power for detecting an effect.
This is done by maximizing your sample size at the broadest ("primary") level and
minimizing the size at more granular ("secondary") levels, subject to the resources
available.
Consider an evaluation of a new curriculum that is being piloted in classes at a
university. The classes have been randomly assigned to intervention and control groups.
Table 12 shows the different consequences of sampling various numbers of classes and
students per class under the assumption that the expenses associated with the experiment
depend on the total number of students randomized to condition (i.e., the expense does
not depend on how many classes those students are distributed among). Whereas
alternative #1 is the worst because it has low power, alternative #4 is the best because
it has both high power and high efficiency.
Table 12. Structuring a sample and consequences for power and efficiency.
|
Structure of sample |
Consequences |
1. |
10 classes and 15 students per class |
Lower power |
2. |
10 classes and 30 students per class |
Slightly higher power than #1 |
3. |
20 classes and 30 students per class |
High power but at a high price (inefficient) |
4. |
20 classes and 15 students per class |
Slightly lower power than #3, but at a much lower price (efficient) |
In general, deciding how to allocate a sample among the different levels will depend on the costs associated with adding a unit at each level. A statistician cna also provide some guidance about the implications of different allocations by examining historical data to estimate how the variability of the outcome measure is likely to be distributed among schools, between teachers within schools, between classrooms within teachers, and between students within classrooms.
Getting large sample sizes at the primary unit of analysis can be difficult because large numbers of them may not be available. If this is the situation you face, you need to make trade-offs. The following is an example of a population in which there are not enough primary units of analysis to generate much power. This fact forces a reconceptualization of what the primary unit needs to be and the formulation of a strategy for minimizing the resultant risks of bias.
Example of maximizing efficiency of a sample
A district is conducting an evaluation of a curriculum intervention. To avoid contamination from interactions between intervention and control teachers, which would be likely to happen if they were within the same school, the evaluators make the school the primary unit of analysis. In other words, they want schools to be randomly assigned to intervention and control groups, so that all the participating teachers in each school are all one or the other. Getting sufficient power requires selecting a large number of schools. Unfortunately, only six schools in the district are available to participate.
The evaluators decide that the risk from contamination is not as important to them as the risk from low power, so they make the teacher the unit of randomization. They try to minimize contamination by asking the intervention teachers to refrain from talking about the intervention with the control teachers.
A note for the statistical analyst
When your sample is composed of clusters of units (for example, schools are
clusters of teachers, and teachers have clusters of students), the default procedures for
calculating many statistics are not valid without further modification. Many statistics
packages such as SPSS and SAS require the analyst to specify how the different levels of
data are related in order to correctly compute confidence intervals and significance
levels. Failure to consider the nested or clustered nature of the data is likely to
lead to an overabundance of Type I errors (false positives).
Similarly, many statistical power calcuators on the World Wide Web typically do not include allowances for clustered data. Power analysis with clustered data is a more complicated procedure than with non-clustered data (http://sitemaker.umich.edu/group-based/optimal_design_software).
|