Step 7: Identify which attributes need special
attention in the design, and take necessary steps to ensure both adequate statistical power and an unbiased sample.
|
|
(P) = plan example
(R) = report example
|
|
The goal of most evaluations is to be able to assess the effectiveness of an intervention within the
evaluation design, and then to have a reasonable degree of assurance that those results will
generalize outside the conditions present in the design. To achieve this goal, to the extent
possible, the sample should be proportionally representative of the population
(P).
To put this more
precisely, it should proportionally represent the attributes that could affect outcomes
(R). If your
population is 60% female and gender is considered an important attribute
(but you do not wish to draw separate conclusions about treatment for males and females separately), then your sample should also be 60% female. Lacking such a match, your evidence will be confounded by the bias that may result when certain values are over- or underrepresented and interactions are present. This bias will render your evaluation unable to determine whether a particular outcome is generalizable to the full population. For example, if the evaluation is done only for female students, who gain an average of 10 points from the intervention condition, then you are not able to conclude that a general implementation of the interventions to both male and female students would result in a gain of 10 points.
Example of a biased sample
A teacher training project is aimed at getting teachers to adopt new instructional
methods and
the stakeholders want it to be effective for all teachers, regardless of how long they
have been
teaching. However, a questionnaire sent to training participants ends up getting
filled out only by
participants who have been teaching for 5 years or more, which happens to be less than
50% of the
population. Because of this bias, the evaluators' findings about the project's
effectiveness cannot
be generalized to the entire population. It would make it difficult for them to
interpret whether the
results of the evaluation would apply to teachers with fewer than 5 years
experience.
Simple Random Sampling
When units of a sample are drawn randomly from the population of interest, the
proportion of people belonging to various important subgroups will tend to be
similar to those in the population itself. For example, if a population of teachers
is 70% female, a random sampling of these teachers will tend to produce a sample that
is also 70% female. A key feature of simple random sampling is that all members of the
population are equally likely to be sampled.
There are two common situations where simple random sampling alone may not be the most
efficient way to draw a sample. The first is when you have a relatively small sample size
and it is important to maintain a representative balance among subgroups. The second
case is when a subgroup is relatively rare, but you would like to provide reliable
statistics about that subgroup. In both cases the solution is to employ a form of
stratified random sampling.
Small samples and stratified sampling
The problem with small samples is that random fluctuations in a small number of units
can significantly distort the proportional representation of subgroups. For example, if
we were to repeatedly draw samples of 1,000 teachers from a population that is 70%
female, in 95% of the cases our sample would contain between 670 and 730 female
teachers, or 67% to 73% female representation. However, if we drew repeated samples of
only 10 teachers from the same population, in 95% of the cases our samples would
contain from 4 to 10 female teachers, for a 40% to 100% representation.
Stratified random sampling first defines subgroups (or strata) and the desired
proportion of representation. In the above example (drawing a sample of 10 teachers) if
we know that the overall proportion of female teachers in the population is 70%, we
would randomly draw exactly 7 teachers from a pool of candidate female teachers, and 3
teachers from a pool of candidate males. As long as each teacher is randomly
drawn from within a particular stratum (in this case, pools of male or female teachers),
the sample will be representative.
Under-represented subgroups and over-sampling
When studying subgroups with relatively sparse representation, simple random sampling
may not provide enough of the subgroup for calculating reliable statistics. For example,
Native American students represent approximately 1.5% of students nationwide. In a
national random sample of 1,000 students, on average only 15 would be expected to be
Native American. 15 students is a very small sample from which to generalize to the
Native American population at large.
A version of stratified random sampling known as over-sampling can accommodate this
problem. If we are interested in studying attributes of Native American students in
particular, we can conduct a power analysis to determine how many Native American
students we should sample. Assume, for example, we would need 100 Native American
students to have sufficient power for our design, and we have the budget to sample 1,000
students total. We would then sample 100 Native American students from the pool of
available Native Americans, and 900 students from the remainder of the population.
The resulting sample will not be proportionally representative of the population at
large, since our sample now has 10% Native American representation. We have in
fact over-sampled Native Americans in order to draw a large enough sample of this
subgroup. When we are studying attributes of Native Americans only, we conduct our
usual analyses on the subsample of 100. When studying the sample of students as a
whole, we have to apply sampling weights to the Native American and non-Native
American students in order to restore the proportional representation of each group.
A statistician can compute the appropriate sampling weights for your particular
circumstance, and indicate how these weights should be incorporated in statistical
analyses.
|