Step Two: Identify a set of observation techniques that are responsive to
your evaluation questions and the logistical constraints of your setting.
Before you can identify all the potential components of your observation instrument, it helps to become familiar with five important dimensions that underlie all observation instruments: (1) lower-inference judgments vs. higher-inference judgments, (2) event sampling vs. time sampling, (3), all-inclusive subjects vs. targeted subjects, (4) real time vs. post hoc, and (5) quantitative vs. qualitative. Any observation item can be characterized in terms of its placement on each of these dimensions.
Lower-Inference Judgments vs. Higher-Inference Judgments
Items on an observation instrument can be characterized by the types of inferences that the observer must make. "Lower-inference" items are those that capture more easily identifiable or surface features of behavior or events (e.g., the gender composition of a small group, the number of times a student raises his or her hand to answer teacher questions). These kinds of items are relatively easy to develop and code because they involve features more easily defined and agreed on. Lower-inference items, however, are not suited to capturing more complex or unexpected features or events.
"Higher-inference" items involve complex judgments about more global, intangible characteristics of a person or situation (e.g., the grade-level appropriateness of a lesson, the organizational clarity of a presentation). Higher- inference coding requires observers to have a way to "unpack" the meaning of a category (as the meaning is defined by the evaluation team) and a strategy for weighing the different components to arrive at an overall judgment. These items are more difficult to construct and achieve agreement on among observers.
Here are two examples of observation items contrasting lower-inference and higher-inference judgments.
The number of times the target student returned to the homepage while using the online tutor.
(Fill in number)
Figure 1. Example of lower-inference judgment item
5. Rate the following aspects of lesson presentation:
Very High
High
Moderate
Low
Very Low
Organization of Content
o
o
o
o
o
Focus on Key Concepts
o
o
o
o
o
Use of Relevant Examples
o
o
o
o
o
Figure 2. Example of higher-inference judgment items
Event Sampling vs. Time Sampling
This distinction relates to the recordings of "what" and "how often" that are typical elements of most observation instruments. Observation items that sample events are those focused on certain types of events to the exclusion of others. For example, if your evaluation concerns student-to-student interactions during science labs, observers might have to sit through a block of class time, only some of which might be devoted to a science lab. They would limit their recording to the lab time (e.g., coding different elements of students’ interactions). Event sampling is likely to capture highly representative data within the event boundaries of interest, although this approach may require very intense coding on the part of the observer (when the events of interest are taking place). In short, event sampling makes sense when you are interested in a relatively narrow and specific context of observable events in a setting.
In contrast, time sampling involves characterizing whatever behaviors or events occur during blocks of time. For example, the observer of an hour-long training session might be asked to observe four different 10-minute blocks and fill out the same set of observation ratings following each block of time. A different approach would involve taking "minute sweeps" during a classroom session. Here, an observer would look about the classroom for several seconds and then make codings (necessarily brief) about what just occurred. The observer would repeat this concentrated process every minute for a predetermined block of time. Doing a set of such codings repeatedly has the advantage of giving you many data points and thus typically increases the reliability of data. The challenge of time-sampling coding involves making sure that your coding categories anticipate all the possible kinds of events of interest that could occur.
Below are examples of an event-sampling observation item and a time-sampling observation item.
Whenever the teacher praises an individual student, note the nature of the praise (e.g., helpful to another student, paying attention to lesson, good explanation when answering a question).
Instance
Nature of Praise
1
2
3
...
Figure 3. Example of event-sampling item
(The following item is repeated regularly throughout the observation protocol, with the expectation that the observer will fill it out once every 10 minutes.)
Percent of students off task(pick the range representing the proportion of students not doing the assigned task/activity during the prior 10 minutes.)
<10%
10-33%
34-66%
67% or more
Figure 4. Example of time-sampling item
All-Inclusive Subjects vs. Targeted Subjects
This distinction relates to "who" is being observed. Do you want to capture what all individuals in a setting are doing? Collecting observation data for all individuals is likely to represent what happens in the setting most accurately. The challenge with all-inclusive coding is that it often is difficult to see and hear all individuals in a setting, especially if you are trying to note when different individuals are involved. All-inclusive coding may be feasible if you have multiple observers in a complex setting or one observer in a setting with relatively few people (say, 10 or fewer) or if your codes involve making global estimates (e.g., the percentage of participants who do not appear to be doing the assigned task, as shown in Figure 4).
For larger settings, an alternative to observing all individuals is to focus on a manageable subset (again, a single observer may have difficulty identifying and keeping track of more than 10 individuals). Depending on your purpose, this targeted subset can be identified randomly (e.g., select two people from each row of seats) or purposefully (e.g., students preselected for achievement level and balanced for gender) and tracked through the length of the observation (e.g., see Figure 5). The risk with targeted coding is that the individuals selected may turn out not to be typical; also, if nontargeted individuals strongly influence the situation, you will have no record of their behavior. When observing a subset of individuals (random or purposeful), each individual typically gets assigned an identifier code for preserving anonymity, and space for this code is built into the observation items. Note that collecting observational data on individuals requires Human Subjects clearance (see Data Collection: Procedures, Schedule, and Monitoring).
In addition to your observations about the overall group reaction to the workshop,
you observed two target participants throughout the workshop. Please summarize below
the behavior of each target student during the 90-minute workshop.
Figure 5. Example of an observation item that targets particular individuals
Real Time vs. Post Hoc
This distinction refers to whether the observer is recording behavior and events as they are occurring or is waiting for some time before recording. Contemporaneous coding makes sense when the observation focuses on many separate, small behaviors or events (e.g., features of individuals’ verbal exchanges) such that the observer could not possibly remember the details of each one unless he or she was coding along in real time. This kind of coding often requires intense concentration on the part of the observer, who is trying to code or write notes while keeping pace with the behaviors and events. In contrast, post hoc coding allows the observer to wait for an interval of time and to take in a collection of behaviors or events before making a coding judgment. For example, to rate the cognitive challenge of an in-class assignment, the observer would need to listen to how the teacher explains it and watch and listen as students react to the assignment. Clearly, post hoc coding is not well suited to capturing great detail in verbal exchanges or specific sequences of fast-paced events.
If an observation instrument involves many post hoc items requiring global ratings or descriptions (e.g., writing a description of the room), it usually is desirable to structure the instrument so that there is enough time for observers to record while they still are in the setting. This approach minimizes memory loss or overload. On the other hand, sometimes evaluators want observers to wait until they have left the setting to complete some overall judgment items and descriptions (e.g., trying to avoid premature judgments before the entirety of the time block has been observed). In these situations, observers should complete their recordings on the same day as their visits.
The example items presented thus far (Figures 1-4) are suitable for completing in real time during the observation; the item in Figure 5 may best be completed after the observation ends (especially if a lengthy description is called for). The rating items (Figure 2) also are candidates for post hoc completion, although they require little time and could be filled out immediately after the observation time is up.
Quantitative vs. Qualitative
Quantitative recording refers to completing observation items that can be translated immediately into numerical data (e.g., checking boxes, behavior tallying). Qualitative recording refers to writing information and descriptions that are not translatable into numerical data without the use of some method of qualitative analysis or that are not intended for reduction to numerical data (e.g., anecdotal records, running records, or scripting of classroom activities). Qualitative recording can range from writing short phrases to writing lengthy and rich descriptions.
Both quantitative and qualitative coding call for highly trained and skilled observers. For quantitative coding, observers need the experience necessary to make quick and consistent judgments in line with the coding definitions established by the evaluators. For qualitative coding, it is important that the observers be skilled in organized note taking and narrative writing and that they know how to avoid projecting personal biases into the accounts.
Whether a recording is quantitative or qualitative per se is less important than the underlying reasoning for the nature of the observation item. As already indicated, the quantitative vs. qualitative emphasis of an instrument will tend to correlate strongly with the degree of behavioral specificity stemming from the evaluation question—that is, if the evaluator can predefine behavior categories of interest and devise coding schemes for them, this situation naturally lends itself to quantitative data.
The quantitative-qualitative dimension of recording has significant practical implications. From the standpoint of the observer, having to spend a lot of time writing prose during an observation can interfere with the observer’s ability to focus on the setting and recognize new events. From the standpoint of the evaluator, a high proportion of qualitative recording usually entails significant time and effort for translating the recording into usable quantitative or qualitative data, and thus involves a much greater expense than direct quantitative analysis. However, if the phenomenon under study is not well understood, it may be necessary to record narrative descriptions of events, activities, and behaviors, since the components of the activities and behaviors have not been identified. In some cases, recording details of behaviors and events may provide evaluators with new insights or clues that can further the evaluation—for example, on issues that were not understood before the observation.