OERL Home Contact Information and Contributors
  : Online Evaluation Resource Library
OERL Overview

Curriculum Development

Teacher Education

Faculty Development

Laboratory Improvement

Under-represented Populations

Technology

Criteria

Glossary

Alignment Tables

User Scenarios

FAQ
OERL Home Plans Instruments Reports
Search

Pointer to Evaluation Articles

Ethics Theme

2006

American Journal of Evaluation

Cooksy. L. (2006). Ethical challenges. American Journal of Evaluation, 27(3),
     370-371.

Abstract
All evaluators face the challenge of striving to adhere to the highest possible standards of ethical conduct. Translating the American Evaluation Association’s (AEA) Guiding Principles and the Joint Committee’s Program Evaluation Standards into everyday practice, however, can be a complex, uncertain, and frustrating endeavor. Moreover, acting in an ethical fashion can require considerable risk taking on the evaluator’s part. In the Ethical Challenges column, commentators share their views of how evaluators might respond to specific problematic situations, linking their analyses to the principles and standards they believe are most relevant to the case. Not surprisingly, the perspectives that commentators offer are not always in agreement. When this occurs, reflection on the nature and sources of these differences of opinion can enhance our sensitivity to the ethical dimensions of our evaluation work and our awareness of the options available for addressing them.

Evaluation Journal of Australasia

Coryn, C. (2006). A conceptual framework for making evaluation support
     meaningful, useful and valuable. Evaluation Journal of Australasia, 6(1),
     45-51.

Abstract
How does support differ from reporting and dissemination? How, and to what extent do existing evaluation theories attend to support? How, and for what purposes should support be provided? Does support infer that evaluators have ethical, moral, professional, contractual or other obligations to provide support? Is support an evaluation service, or should it be considered an evaluator competency or skill? Or, as with the traditional research paradigm, should evaluators merely let their reports speak for them? While the Key Evaluation Checklist (KEC) and other evaluation theories and approaches have provided a conceptual basis for evaluation support, further clarification is necessary in order to make support an integral part of evaluation practice. The evaluation support construct as posited here necessitates means that are direct and indirect, technical and general, and includes alternative scenarios for the purposes of advocating for, assisting and helping evaluands, clients, stakeholders, and audiences and users of evaluation.

Learning and Assessment Theme

2006

Measurement: Interdisciplinary Research and Perspectives

Smith, C., Wiser, M., Anderson, C., & Krajcik, J. (2006). Implications of research on
     children's learning for standards and assessment: A proposed learning
     progression for matter and the atomic-molecular theory.
     Measurement: Interdisciplinary Research and Perspectives, 4(1&2), 1-98.

Abstract
The purpose of this article is to suggest ways of using research on children's reasoning and learning to elaborate on existing national standards and to improve large-scale and classroom assessments. The authors suggest that learning progressions—descriptions of successively more sophisticated ways of reasoning within a content domain based on research syntheses and conceptual analyses—can be useful tools for using research on children's learning to improve assessments. Such learning progressions should be organized around central concepts and principles of a discipline (i.e., its big ideas) and show how those big ideas are elaborated, interrelated, and transformed with instruction. They should also specify how those big ideas are enacted in specific practices that allow students to use them in meaningful ways, enactments the authors describe as learning performances. Learning progressions thus can provide a basis for ongoing dialogue between science learning researchers and measurement specialists, leading to the development of assessments that use both standards documents and science learning research as resources and that will give teachers, curriculum developers, and policymakers more insight into students' scientific reasoning. The authors illustrate their argument by developing a learning progression for an important scientific topic—matter and atomic-molecular theory—and using it to generate sample learning performances and assessment items.

Studies In Educational Evaluation

C. Clark, & Rust, F. (2006). Learning-centered assessment in teacher education.
     Studies In Educational Evaluation, 32(1), 73-82.

Abstract
The collection of which this article is a part is entitled "Functions of Assessment in Teacher Education." The other papers in this set draw our attention to multiple functions of assessment including peer evaluation, promoting reflection, use of technology in demonstrating teacher competence, and validating new national standards of excellence among veteran teachers. Our contribution to the conversation is a modest one — a simple heuristic device that we have found useful in designing and administering assessments in teacher education programs in the USA. Our purpose is to give a brief description of the heuristic device and the assumptions that support it and then to illustrate how we have made use of it in our work as teacher educators.

2007

Studies In Educational Evaluation

Harlen, W. (2007). Criteria for evaluating systems for student assessment.
     Studies in Educational Evaluation, 33(1), 15-28.

Abstract
The assessment of students is used for various different purposes within an assessment system. It has an impact on students, teaching and the curriculum, the nature of this impact depending upon how it is carried out. In order to evaluate the advantages and disadvantages of particular assessment procedures, criteria need to be applied. This article discusses the criteria of construct validity, reliability, desired impact (consequential validity), and good use of resources, and applies them to assessment for formative and summative purposes. It concludes that for these purposes, the criteria are more readily met when there is greater use of teachers’ judgments in assessment rather than external tests.

Logic Modeling Theme

2006

Evaluation and Program Planning

Renger, R., & Hurley, C. (2006). From theory to practice: Lessons learned in the
     application of the ATM approach to developing logic models. Evaluation and
     Program Planning
, 29(2), 106-119.

Abstract
The topic of logic models has received significant attention in the evaluation and social science literature, focusing either on the theory of logic models or methodology for program design. The evaluation and social science literature dedicated to logic models has been criticized for being overly complex and too difficult for practitioners to understand and utilize. Agencies such as the Kellogg Foundation and the United Way have championed initiatives to bridge the theory–application gap. They have done so by publishing simple, step-by-step instructions as how to create a logic model, intended primarily for those responsible for implementing human service programs. The difficulty with these prescriptive publications is that they unintentionally mislead the practitioner into believing that the task of creating a logic model is as simple as completing a one-page table. The understanding that logic modeling is a process, the results of which can then be summarized in a one-page table, is lost. This misunderstanding is partly due to the dearth of literature devoted to the logic model process. In 2002, a systematic three-step process to creating a logic model, coined the ATM approach, was published in an attempt to meet this need. Since its publication, the ATM approach has been used in a variety of settings. The purpose of this paper is to report on the practical lessons learned in the process of creating a logic model using the ATM approach.

2007

Evaluation and Program Planning

Nesman, T.M., Batsche, C., & Hernandez, M. (2007). Theory-based evaluation of a
     comprehensive Latino education initiative: An interactive evaluation approach.
     Evaluation and Program Planning. 30(3), 267-281.

Abstract
Latino student access to higher education has received significant national attention in recent years. This article describes a theory-based evaluation approach used with ENLACE of Hillsborough, a 5-year project funded by the W.K. Kellogg Foundation for the purpose of increasing Latino student graduation from high school and college. Theory-based evaluation guided planning, implementation as well as evaluation through the process of developing consensus on the Latino population of focus, adoption of culturally appropriate principles and values to guide the project, and identification of strategies to reach, engage, and impact outcomes for Latino students and their families. The approach included interactive development of logic models that focused the scope of interventions and guided evaluation designs for addressing three stages of the initiative. Challenges and opportunities created by the approach are discussed, as well as ways in which the initiative impacted Latino students and collaborating educational institutions.

Evaluation and Statistical Methods Theme

2006

Canadian Journal of Program Evaluation

Cahill, I., Folkes, P., & Szabo, L. (2006). Simulating or imputing non-participant
     intervention durations using a flexible semi-parametric model. Canadian
    Journal of Program Evaluation
, 21(2), 181-200.

Abstract
In the evaluation of labour market training programs using matching, evaluators must decide when to start comparing participant outcomes against non-participant outcomes. Measurement relative to an intervention period permits the separation of training opportunity costs from possible benefits, but an equivalent period for the comparison group must be determined. One method imputes the timing of the intervention for comparisons from that of the participant match. However, with Propensity Score Matching, this may produce biased outcome estimates. Instead, the authors develop and apply semi-parametric duration models using Human Resources and Social Development Canada data to simulate the positioning and duration of the intervention for non-participants.

Evaluation Journal of Australasia

Harvey, G., & Hurworth, R. (2006). Exploring program sustainability: identifying
     factors in two educational initiatives in Victoria. Evaluation Journal of Australasia,
     6(1), 36-45.

Abstract
This paper examines two recent successful school-based health initiatives in Victoria, particularly in relation to factors that seem to foster program sustainability. These programs, dealing with drug education and healthy eating, are described before presenting two different methods (individual and group) used to determine elements that allow for the continuation of such projects. The findings on sustainability from each program are discussed using the broad areas of factors associated with the programs themselves; those associated with the context in which the programs were implemented; and finally, those factors external to the programs and their implementation contexts. These results indicate a strong congruence with factors identified in the literature but also highlight the influence of the use of change theory in strengthening sustainability approaches in program development as well as the need to focus on funding options in forward planning. The possible roles for evaluators in assisting program development and supporting the integration of factors supporting sustained use are also discussed.

Journal of Policy Analysis and Management

Greenberg, D., Michalopoulos, C. & Robin, P. (2006). Do experimental and
     nonexperimental evaluations give different answers about the effectiveness of
     government-funded training programs? Journal of Policy Analysis and
     Management
, 25(3), 523-552.

Abstract
This paper uses meta-analysis to investigate whether random assignment (or experimental) evaluations of voluntary government-funded training programs for the disadvantaged have produced different conclusions than nonexperimental evaluations. Information includes several hundred estimates from 31 evaluations of 15 programs that operated between 1964 and 1998. The results suggest that experimental and nonexperimental evaluations yield similar conclusions about the effectiveness of training programs, but that estimates of average effects for youth and possibly men might have been larger in experimental studies. The results also suggest that variation among nonexprimental estimates of program effects is similar to variation among experimental estimates for men and youth, but not for women (for whom it seems to be larger), although small sample sizes make the estimated differences somewhat imprecise for all three groups. The policy implications of the findings are discussed.

2007

American Journal of Evaluation

Christie, C.A. (2007). Reported influence of evaluation data on decision makers'
     actions: An empirical examination. American Journal of Evaluation, 28(1),
     8-25.

Abstract
Using a set of scenarios derived from actual evaluation studies, this simulation study examines the reported influence of evaluation information on decision makers’ potential actions. Each scenario described a context where one of three types of evaluation information (large-scale study data, case study data, or anecdotal accounts) is presented and a specific decision needs to be made. Participants were asked to indicate which type of data presented would influence their decision making. Results from 131 participants indicate that participants were influenced by all types of information, yet large-scale and case study data are more influential relative to anecdotal accounts; certain types of evaluation data are more influential among certain groups of decision makers; and choosing to use one type of evaluation data over the other two depends on the independent influence of other types of evaluation data on the decision maker, as well as prior beliefs about program efficacy.

Educational Research and Evaluation

Muschkin; C., & Malone, P. (2007). Multiple teacher ratings: An evaluation of
    measurement strategies. Educational Research and Evaluation, 13(1), 71-86.

Abstract
This study addresses the questions that arise when collecting, describing, and analyzing information from multiple informants regarding attributes of individual students. Using data from the Fast Track study, we evaluate alternative measurement strategies for using multiple teacher ratings of student adjustment to middle school among a sample of 326 Grade-6 pupils. One goal of the study was to compare the advantages of three measurement strategies using multiple and single informants in terms of their correlation with contemporaneous measures of behavior and academic achievement. Comparisons of residual variance using an aggregated rating, the rating from an "optimal informant," and a score selected at random from the response set, indicate that aggregation provides the highest criterion-related validity. As part of these analyses, we explore the significance of inter-rater concordance, measured in terms of the intraclass correlation coefficient (ICC). Results indicate that for some aggregated scores, reliability can significantly limit their interpretability. The second main goal of the study was to evaluate the effects of variation in the number of teacher ratings on residual variance estimates for aggregate measures in selected behavioral domains. We conclude that the advantages of using multiple ratings are significant with a larger number of informants.

Evaluation and Program Planning

Roberts-Gray, C. Gingiss, P.M., & Boerm, M. (2007). Evaluating school capacity to
    implement new programs. Evaluation and Program Planning, 30(3), 247-257.

Abstract
An eight-factor survey-based Bayesian model (Bridge-It) for assessing school capacity to implement health and education programs was tested in secondary analyses of data from 47 schools in the Texas Tobacco Prevention Initiative (TTPI). Bridge-It was used during the pre-implementation phase and again at mid-course of the TTPI 2 years later. Achieved implementation status was evaluated in follow-up almost 4 years after the start of the TTPI. The Bridge-It score aggregated across all eight of the capacity factors predicted both quality of adherence to the Guidelines for School Programs to Prevent Tobacco Use and Addiction and quantity of implementing activity. The school-based leadership factor was an independent predictor of quality of adherence whereas the facilitation processes factor predicted quantity of implementing activity. Integration of Bridge-It, or comparable multi-attribute tools, into the planning and evaluation of school-centered programs can increase understanding of factors that influence implementation and provide guidance for capacity building.

New Directions for Evaluation

George Julnes and Debra J. Rog. (Ed.). Informing federal policies on evaluation
    methodology: Building the evidence base for method choice in government
    Sponsored Evaluation. New Directions for Evaluation, 2007(113).

Articles:

Julnes, G., & Rog, D.J. Current federal policies and controversies over methodology in evaluation. 1-12. (Editors’ Notes)

Chelimsky, E. Factors influencing the choice of methods in federal evaluation practice. 13-33.
Abstract
A critical historical review of the tensions in American governance places the method choice debate in a broader perspective. This chapter reviews the factors that influence the evaluation questions posed to evaluators and, in turn, the methods choices that stem from it. Political and professional pressures on the evaluators also influence methods choice. Flexibility in methods is considered essential for the evaluator to design a study that considers both the context and the specifics of the question.

Datta, L. Looking at the evidence: What variations in practice might indicate. 35-54.
Abstract
This chapter presents the findings from a review of the practice of evaluation in federal agencies as an attempt to inform policies on method choice. The author explores whether federal agencies differ in their approaches to evaluation design and the factors that influence these differences. The nature of the programs, agency culture, evaluator training and experience, and the politics of methodology all emerge as possible context-appropriate influences on method choice.

Boruch, R. Encouraging the flight of error: Ethical standards, evidence standards, and randomized trials. 55-73.
Abstract
Thomas Jefferson recognized the value of reason and scientific experimentation in the eighteenth century. This chapter extends the idea in contemporary ways to standards that may be used to judge the ethical propriety of randomized trials and the dependability of evidence on effects of social interventions.

Yin, R.K., Davis, D. Adding new dimensions to case study evaluations: The case of evaluating comprehensive reforms. 75-93.
Abstract
This chapter describes the adaptation of the case study method to assessing increasingly complex, comprehensive reform initiatives that highlight the blurring of the boundaries between phenomenon and context and the concurrence of multiple interventions. Completed studies of two education reform programs illustrate the ongoing challenges of identifying, measuring, and analyzing large-scale reforms at multiple levels and across sites.

Shadish, W.R., Rindskopf, D.M., Methods for evidence-based practice: Quantitative synthesis of single-subject designs. 95-109.
Abstract
Good quantitative evidence does not require large, aggregate group designs. The authors describe ground-breaking work in managing the conceptual and practical demands in developing meta-analytic strategies for single subject designs in an effort to add to evidence-based practice.

Greene, J.C., Lipsey, M.W., Schwandt, T.A., Smith, N.L., Tharp, R.G. Method choice: Five discussant commentaries. 111-127.
Abstract
Productive dialogue is informed best by multiple and diverse voices. Five seasoned evaluators, representing a range of evaluation perspectives, offer their views in two- to three-page discussant contributions. These individuals were asked to reflect and comment on the previous chapters in the spirit of critical review as a key source of evidence about methods choice.

Julnes, G., Rog, D.J. Pragmatic support for policies on methodology. 129-147.
Abstract
This final chapter summarizes the areas of consensus in the debate on method choice, including considering the nature of the primary evaluation questions, the nature of the phenomenon being evaluated, the constraints on the evaluation, and ethical issues. Pragmatic suggestions based on these areas as well as areas still in contention are offered.

Policy Analysis and Management

Wilde, E.T., & Hollister, R. (2007). How close is close enough? Evaluating
    propensity score matching using data from a class size reduction experiment.
    Policy Analysis and Management, 26(3), 455-477.

Abstract
In recent years, propensity score matching (PSM) has gained attention as a potential method for estimating the impact of public policy programs in the absence of experimental evaluations. In this study, we evaluate the usefulness of PSM for estimating the impact of a program change in an educational context (Tennessee's Student Teacher Achievement Ratio Project [Project STAR]). Because Tennessee's Project STAR experiment involved an effective random assignment procedure, the experimental results from this policy intervention can be used as a benchmark, to which we compare the impact estimates produced using propensity score matching methods. We use several different methods to assess these nonexperimental estimates of the impact of the program. We try to determine how close is close enough, putting greatest emphasis on the question: Would the nonexperimental estimate have led to the wrong decision when compared to the experimental estimate of the program? We find that propensity score methods perform poorly with respect to measuring the impact of a reduction in class size on achievement test scores. We conclude that further research is needed before policymakers rely on PSM as an evaluation tool.

Studies In Educational Evaluation

Anderson, L.W. (2007). The educator role of educational evaluators: A tribute to
    Arieh Lewy. Studies In Educational Evaluation, 33(1), 15-28.

Abstract
Educators, practitioners, and policy makers have long debated the impact of the results of evaluation studies on educational practice. If evaluation is to influence practice, the communication gap between researchers and decision makers must be reduced or eliminated. In order to close this gap, evaluators must become educators, working with a variety of intended audiences to help them understand the methodology and results of evaluation studies. To do this, a greater understanding of the intended audience(s) is needed. In addition, conceptual frameworks must be developed to facilitate a common or shared understanding of the results of evaluation studies.

Standards Theme

2006

Educational Measurement: Issues and Practice

Camara, W., & Lane, S. (2006). A historical Perspective and Current Views on the Standards for Educational and Psychological Testing. Educational Measurement: Issues and Practices, 35(3), 8-25.

Abstract
The Standards for Educational and Psychological Testing have evolved in the breadth and depth of coverage of issues in educational testing and measurement since their first publication in 1954. There were a number of substantive changes in the 1999 revision that addressed validity, fairness, accommodations, and compliance with the Standards. In addition, there was nearly a 50% increase in the number of standards contained in the last revision. The next revision of the Standards may be initiated in 2007 and there are remaining concerns about access and awareness by non-measurement professionals, compliance by test publishers and users, relevance in addressing mandates for accountability, and substantive areas of educational assessment. This review of major changes to the Standards and discussion of future topics is designed to inform the next revision.

2007

Educational Measurement: Issues and Practice

Webb, N.M., Herman, J.L., & Webb, N.L. (2007). Alignment of mathematics state-
    level standards and assessments: the role of reviewer agreement. Educational
    Measurement: Issues and Practice
, 26(2), 17-29.

Abstract
This article examines the role of reviewer agreement in judgments about alignment between tests and standards. We used case data from three state alignment studies to explore how different approaches to incorporating reviewer agreement changes alignment conclusions. The three case studies showed varying degrees of reviewer agreement about correspondences between objectives and test items. Moreover, taking into account reviewer agreement in the analyses sometimes had a marked effect on alignment conclusions. We discuss reasons for differences across case studies and alignment approaches, as well as implications for future alignment efforts.

 

NSF Logo
Contact Information and Contributors NSF Logo
Contact Information Contributors Contact Information Contributors OERL Home