go home

Select a Professional Development Module:
Key Topics Strategy Scenario Case Study References

Introduction  |  Step 1  |  Step 2  |  Step 3  |  Step 4  |  Step 5  |  Step 6  |  Step 7

Step 7: Identify which attributes need special attention in the design, and take necessary steps to ensure both adequate statistical power and an unbiased sample.

(P) = plan example
(R) = report example

The goal of most evaluations is to be able to assess the effectiveness of an intervention within the evaluation design, and then to have a reasonable degree of assurance that those results will generalize outside the conditions present in the design. To achieve this goal, to the extent possible, the sample should be proportionally representative of the population (P). To put this more precisely, it should proportionally represent the attributes that could affect outcomes (R). If your population is 60% female and gender is considered an important attribute (but you do not wish to draw separate conclusions about treatment for males and females separately), then your sample should also be 60% female. Lacking such a match, your evidence will be confounded by the bias that may result when certain values are over- or underrepresented and interactions are present. This bias will render your evaluation unable to determine whether a particular outcome is generalizable to the full population. For example, if the evaluation is done only for female students, who gain an average of 10 points from the intervention condition, then you are not able to conclude that a general implementation of the interventions to both male and female students would result in a gain of 10 points.

Example of a biased sample

A teacher training project is aimed at getting teachers to adopt new instructional methods and the stakeholders want it to be effective for all teachers, regardless of how long they have been teaching. However, a questionnaire sent to training participants ends up getting filled out only by participants who have been teaching for 5 years or more, which happens to be less than 50% of the population. Because of this bias, the evaluators' findings about the project's effectiveness cannot be generalized to the entire population. It would make it difficult for them to interpret whether the results of the evaluation would apply to teachers with fewer than 5 years experience.

Simple Random Sampling

When units of a sample are drawn randomly from the population of interest, the proportion of people belonging to various important subgroups will tend to be similar to those in the population itself. For example, if a population of teachers is 70% female, a random sampling of these teachers will tend to produce a sample that is also 70% female. A key feature of simple random sampling is that all members of the population are equally likely to be sampled.

There are two common situations where simple random sampling alone may not be the most efficient way to draw a sample. The first is when you have a relatively small sample size and it is important to maintain a representative balance among subgroups. The second case is when a subgroup is relatively rare, but you would like to provide reliable statistics about that subgroup. In both cases the solution is to employ a form of stratified random sampling.

Small samples and stratified sampling

The problem with small samples is that random fluctuations in a small number of units can significantly distort the proportional representation of subgroups. For example, if we were to repeatedly draw samples of 1,000 teachers from a population that is 70% female, in 95% of the cases our sample would contain between 670 and 730 female teachers, or 67% to 73% female representation. However, if we drew repeated samples of only 10 teachers from the same population, in 95% of the cases our samples would contain from 4 to 10 female teachers, for a 40% to 100% representation.

Stratified random sampling first defines subgroups (or strata) and the desired proportion of representation. In the above example (drawing a sample of 10 teachers) if we know that the overall proportion of female teachers in the population is 70%, we would randomly draw exactly 7 teachers from a pool of candidate female teachers, and 3 teachers from a pool of candidate males. As long as each teacher is randomly drawn from within a particular stratum (in this case, pools of male or female teachers), the sample will be representative.

Under-represented subgroups and over-sampling

When studying subgroups with relatively sparse representation, simple random sampling may not provide enough of the subgroup for calculating reliable statistics. For example, Native American students represent approximately 1.5% of students nationwide. In a national random sample of 1,000 students, on average only 15 would be expected to be Native American. 15 students is a very small sample from which to generalize to the Native American population at large.

A version of stratified random sampling known as over-sampling can accommodate this problem. If we are interested in studying attributes of Native American students in particular, we can conduct a power analysis to determine how many Native American students we should sample. Assume, for example, we would need 100 Native American students to have sufficient power for our design, and we have the budget to sample 1,000 students total. We would then sample 100 Native American students from the pool of available Native Americans, and 900 students from the remainder of the population.

The resulting sample will not be proportionally representative of the population at large, since our sample now has 10% Native American representation. We have in fact over-sampled Native Americans in order to draw a large enough sample of this subgroup. When we are studying attributes of Native Americans only, we conduct our usual analyses on the subsample of 100. When studying the sample of students as a whole, we have to apply sampling weights to the Native American and non-Native American students in order to restore the proportional representation of each group. A statistician can compute the appropriate sampling weights for your particular circumstance, and indicate how these weights should be incorporated in statistical analyses.