OERL : PD Modules : Methodological Approach and Sampling : Strategy : Step 6

Evaluations are ultimately constrained by economic considerations. Given limited time and money resources, how can one extract the maximum amount of useful information from an evaluation? One key decision entails choosing an appropriate sample size (see Step 6). When the subjects of an evaluation study are naturally clustered in larger units (such as districts, schools, or classrooms), the question of optimal sample size is more complicated. For example, if an interview budget allows for 100 face-to-face interviews, is it better to interview 5 students in each of 20 classrooms, 10 students in each of 10 classrooms, or 20 students in each of 5 classrooms? Which of these scenarios is likely to yield the greatest information (and, consequently, best reduce the probability of Type I and Type II errors)? The idea of efficiency is that one should try to maximize the ratio of information gained to total cost.

Evaluations often classify their subjects by dividing them into hierarchical levels, such as districts, schools, classes, and students. In your sampling procedure, you can select members in each level in a way that maximizes the sample's power for detecting an effect. This is done by maximizing your sample size at the broadest ("primary") level and minimizing the size at more granular ("secondary") levels, subject to the resources available.

Consider an evaluation of a new curriculum that is being piloted in classes at a university. The classes have been randomly assigned to intervention and control groups. Table 12 shows the different consequences of sampling various numbers of classes and students per class under the assumption that the expenses associated with the experiment depend on the total number of students randomized to condition (i.e., the expense does not depend on how many classes those students are distributed among). Whereas alternative #1 is the worst because it has low power, alternative #4 is the best because it has both high power and high efficiency.

	Structure of sample	Consequences
1.	10 classes and 15 students per class	Lower power
2.	10 classes and 30 students per class	Slightly higher power than #1
3.	20 classes and 30 students per class	High power but at a high price (inefficient)
4.	20 classes and 15 students per class	Slightly lower power than #3, but at a much lower price (efficient)

In general, deciding how to allocate a sample among the different levels will depend on the costs associated with adding a unit at each level. A statistician cna also provide some guidance about the implications of different allocations by examining historical data to estimate how the variability of the outcome measure is likely to be distributed among schools, between teachers within schools, between classrooms within teachers, and between students within classrooms.

Getting large sample sizes at the primary unit of analysis can be difficult because large numbers of them may not be available. If this is the situation you face, you need to make trade-offs. The following is an example of a population in which there are not enough primary units of analysis to generate much power. This fact forces a reconceptualization of what the primary unit needs to be and the formulation of a strategy for minimizing the resultant risks of bias.

A district is conducting an evaluation of a curriculum intervention. To avoid contamination from interactions between intervention and control teachers, which would be likely to happen if they were within the same school, the evaluators make the school the primary unit of analysis. In other words, they want schools to be randomly assigned to intervention and control groups, so that all the participating teachers in each school are all one or the other. Getting sufficient power requires selecting a large number of schools. Unfortunately, only six schools in the district are available to participate.

The evaluators decide that the risk from contamination is not as important to them as the risk from low power, so they make the teacher the unit of randomization. They try to minimize contamination by asking the intervention teachers to refrain from talking about the intervention with the control teachers.

When your sample is composed of clusters of units (for example, schools are clusters of teachers, and teachers have clusters of students), the default procedures for calculating many statistics are not valid without further modification. Many statistics packages such as SPSS and SAS require the analyst to specify how the different levels of data are related in order to correctly compute confidence intervals and significance levels. Failure to consider the nested or clustered nature of the data is likely to lead to an overabundance of Type I errors (false positives).

Similarly, many statistical power calcuators on the World Wide Web typically do not include allowances for clustered data. Power analysis with clustered data is a more complicated procedure than with non-clustered data (http://sitemaker.umich.edu/group-based/optimal_design_software).