go home

Select a Professional Development Module:
Key Topics Strategy Scenario Case Study References

Introduction  |  Step 1  |  Step 2  |  Step 3

Step One: Determine what features of the setting and behaviors you want to observe.

Before thinking about specific types of observation techniques and instruments, it is helpful to determine what observed features would best address your evaluation questions. Consider each of your evaluation questions and the range of implementation and outcome variables that could be addressed through observation (see Methodological Approach and Sampling). Do your evaluation questions imply measuring or counting discrete behaviors (e.g., the number of times students use a calibration tool, the percentage of science questions the teacher addresses to girls)? Or do your evaluation questions imply capturing many aspects of setting and behavior, including some that you might not be able to anticipate fully (e.g., the motivational aspects of students' encounters with new software)? The former concern with discrete behaviors suggests that a quantitative observation instrument involving carefully defined categories and some type of tallying would be appropriate. The latter concern with more global variables suggests the appropriateness of either a quantitative observation instrument based on rubrics and observer judgment or a qualitative observation instrument prompting the observer for particular descriptions or a running narrative. Another way to think about this contrast is that for the former observation instrument, the decisions about what is and isn't important to observe are predetermined, along with the decision rules for interpreting and coding what is observed. The latter observation instrument calls for more interpretation on the part of the observer, some of which may take place after the fact.

Regardless of how specifically you can define the environmental and behavioral features of interest ahead of time, it is very important to establish some framework and boundaries for your observation protocol. Otherwise, you run the risk of ending up with observational data that vary wildly in depth, emphasis, and quality — the inherent risk in sending different individuals out into different, complex settings. In short, observation instruments are susceptible to yielding the least reliable (e.g., reproducible) data when compared with data generated by questionnaires and interviews. The more structured your observation instrument, the easier it will be to collect reliable data with that instrument. Bear in mind that even a highly defined and categorized quantitative instrument will involve substantial training to achieve agreement among observers as to what different categories mean.

In the remainder of this section, we first will illustrate the risks associated with asking observers to produce a completely open-ended narrative as the sole observation approach. Then we will illustrate briefly how different kinds of evaluation questions imply different levels of specificity for the design of an observation instrument.

Limitations of the Technique of Open-Ended Narrative

Here is an example of what can happen when the foci of an observation protocol are not articulated beforehand.

Two observers are asked to observe and take descriptive notes in the classroom of a fourth-grade teacher who has participated in a program designed to increase students' exposure to literature and connections to other elements of the curriculum. Both observers visit during the same 2-hour period, when the teacher presents a language arts lesson followed by a social studies lesson. Each observer is then asked to use his or her notes to write a narrative that captures the extent to which the teacher's classroom reflects the literature emphasis.

When completed, the narrative of the first observer portrays what appears to be a literature-rich environment. This observer describes how the classroom is decorated with posters about books and student book reports. He also describes how half of the language arts lesson is devoted to reading and discussing a historical novel, with the teacher making many interesting cultural references. Furthermore, this observer describes how the teacher connects the social studies topic to this novel, asking students to spend part of the lesson illustrating scenes from the novel on a group mural. The overall impression from this narrative is that the teacher creatively infuses literature into her classroom.

The narrative of the second observer is lengthier and portrays a different atmosphere. Although this narrative describes the environment and activities captured in the first narrative, this observer writes extensively about the chaotic management structure of the classroom. She notes that many students appear distracted during the two lessons and that the teacher is interrupted frequently by student misbehavior. When students are assigned to work on painting the mural, she relates how more than half of the students do not participate. Furthermore, she describes the presence of literature linkages in the classroom as largely superficial, writing that there is little evidence of higher-level tasks or student thinking about potential literary content and themes.

As you can see, sending two observers to record the same situation did not result in two similar, "objective" narratives. Nor would it be fair to say that one of the observers was necessarily "more right" than the other. What is certain is that each observer had a different way of filtering what he or she saw, and probably a different way of valuing it as well. The first observer was impressed with the creative energy of the class, not particularly bothered by students' being off task (or less willing to make a judgment of what constitutes off-task behavior), and not inclined to view the student misbehavior as serious. The second observer was more analytical about classroom management and the cognitive demands placed on students, clearly coming to the situation with a framework that judged these aspects of classroom life as very important. If one were to send these two observers out to different set of classrooms, it would be difficult to compare their narratives because it would be unclear what was attributable to differences among classrooms versus what was attributable to differences between the observers in their inherent ways of viewing classrooms. This observer effect would pose a severe threat to an evaluation attempting to describe a program's impact on a group of teachers.

This example of an open-ended narrative is not intended to disparage the notion of observation narratives. There are many situations when gathering rich, largely unfiltered descriptions can be helpful, especially if observers have been trained to minimize their biases. Often these descriptions become the database for a second stage of analysis, in which the observer reexamines the data for specific evidence and then makes summary judgments (see, for example, Schorr, Firestone, & Monfils, 2003). Even when gathering rich descriptions, however, we would argue that the parameters of what to describe and how to describe it must be clearly defined. For an example of using a focused narrative for an evaluation, see the Case Study later in this module.

Different Degrees of Specificity

The module Methodological Approach and Sampling addresses how evaluation questions can be categorized according to whether they ask about changes in implementation (the processes of carrying out a program) or changes in outcomes (the intended results of a program). This framework can help you define the nature of the changes you are interested in. Furthermore, once you can define these changes, it is considerably easier to develop a set of practical evaluation questions that can be addressed. If the questions can be answered in full (or part) by observable features of setting and behavior, one has a beginning framework for specifying the kinds of features worth observing.

As an example, the table below contrasts four evaluation questions that point to collecting observational data (asking participants to self-report on the behavior would be a fallback only if there was no budget for observations). As indicated in the third column, each question can be answered by observing different combinations of features, some that can be well specified ahead of time and others that are best left more open-ended. These different levels of specificity have implications for using different observation techniques.

Table 1. Observation Specificity Associated with Different Evaluation Questions

Type of Evaluation
Evaluation Question
Specificity of Observation
Implementation
Are students showing more of the desired behaviors while using the new curriculum than they were before using the new curriculum? Specific target behaviors are operationally defined and probably could be counted or coded in some fashion. Narrative detail should be collected.
Implementation
How is implementation of the new curriculum affecting the classroom management structures that teachers already had in place? It may be possible to specify some features of classroom management structures ahead of time for coding or rating. Additionally, there may be unanticipated complexities that need to be allowed for and captured with substantial narrative detail.
Outcome
Can individual students give a better class presentation after participating in five mini-sessions on public speaking than they could before participating in the mini-sessions? A number of public speaking features (dawn from the mini-sessions) could be the basis for coding or rating behavior (including that of the audience). Narrative detail could be collected.
Outcome
Are teachers better able to distribute their questioning opportunities equally among boys and girls after attending a gender sensitivity workshop than they could before the workshop? A straightforward tally of teacher questions directed at boys and at girls can answer this question. Some complexity may be introduced via coding categories if there is a concern about the nature of questions being asked.

Table 1 demonstrates how the nature of your evaluation questions already can shape your search for the best candidates among different kinds of observation techniques. In the followings sections, we describe and illustrate the range and suitability of these techniques.