Step 7. Identify potential risks to the accuracy of the data you will collect. To minimize these risks, decide if it is necessary to implement and carry out procedures to enhance the trustworthiness of the data, such as training programs for data collectors (R). |
|
|
Trustworthy data are data that are both valid and reliable. Validity is achieved when the data measure the characteristics that you intend them to measure. Reliability is achieved when one can be assured that the results of the data collection would be reproducible under repeated administrations, ratings, or codings. The following list contains common causes of measurement error that threaten the validity and reliability of results.
- The individuals from whom you seek data have characteristics that prevent them from being able to
understand the intent of your questions (such as a lack of understanding of the vocabulary used in
the questions).
- Data collection procedures are poorly conceived or poorly implemented. Examples include
confusing or inconsistent directions to respondents, or failure of interviewers or observers to
follow the directions specified in their protocols.
- The settings in which people respond prevent them from paying enough
attention to the questions. Examples are settings that are excessively hot, cold, or
noisy.
- Data interpretation procedures are inadequate. Examples are the application of
ambiguous, confusing, or misdirected rating (e.g., scoring) or coding criteria.
The following paragraphs suggest training procedures and other strategies for preventing the forms of measurement error just described.
- Use of guidelines and backup procedures for administration of evaluation instruments.
Some questionnaires and learning assessments are administered to respondents in live situations.
Sometimes this can be very easy, as for example, when a paper feedback questionnaire is administered to adults before they exit a workshop. It may require a very simple
training procedure that might only take a few minutes. At other times, it can be more challenging, due to
complexities built into the delivery of the instrument or to technical problems. An
example would be an assessment of young students’ learning that requires the students to work in pairs at
computer stations and log on to the Internet. Many things can go wrong in such a situation. The computers
may malfunction, Internet access may be unacceptably slow, and some students may have to
work alone because their partners are absent. Backup strategies may need to be built into the
administrative procedures of the assessment to prevent problems from resulting in a failure to
collect data. Thorough guidelines may need to be developed to address these complexities. The more
detailed the guidelines are, however, the greater the risk that the instrument administrators will make errors if they are not sufficiently trained.
- Use of corroboration and taping when carrying out observations and interviews. Observations and
interviews require that data collectors have the skill to collect information and maintain fidelity to
the protocols that have been designed for them to record information. The errors that can result from
lack of fidelity to the protocols can be offset through corroboration. Corroboration can be achieved
when multiple data collectors observe the same phenomena or take notes at the same interviews. If
multiple data collectors cannot be present, the audio of the interview can be recorded and the
observation can be videotaped. Taping makes possible repeated asynchronous visits by multiple
data collectors.
Use of coding and rating of information. In data analysis, coding is the ascription of standardized interpretations to raw information. An example would be the coding of a particular student response in a classroom discussion as demonstrative of a particular cognitive process. Rating is a form of coding in which a standardized value is assigned to student work, such as a student's answer to an open-ended question on a learning assessment. Coding and rating make it possible to classify, aggregate, and summarize results. Table 2 below identifies the types of artifacts that are coded and rated, and the types of standardized criteria used for rendering coding and rating discussions.
Table 2. Types of artifacts and interpretive criteria used in training raters and coders
Instrument |
Artifact |
Interpretive criteria |
Learning assessment |
Student work |
Rating rubric |
Observation protocol |
Field notes |
Coding scheme |
Interview protocol |
Transcripts |
Coding scheme |
Questionnaire |
Respondent form |
Coding scheme |
Use of rater and coder training. Reliability in coding and rating occurs
when different coders and raters render the same decisions. If an item is valid (see the Instrument
Triangulation and Selection module for more on validity) and the rating or coding criteria are
comprehensive and clearly communicated, it should be possible to achieve coding and rating
reliability as long as the raters and coders have the skills to properly apply the criteria.
This requires training, which is the subject of Step 8.
|