The Dark Side of Outcome Evaluation

Doug Easterling

This piece was first published in the newsletter of the Grantmaker's Evaluation Network, Volume 9/Number 1, Winter 2001. It is published here with permission from Doug Easterling.

Although foundations have long been concerned about producing social benefits from their grants, the sector is now experiencing a pronounced movement toward measurable results. Two highly influential Harvard Business Review articles ("Virtuous Capital" by Letts, Ryan and Grossman in 1997 and "Creating Value" by Porter and Kramer in 1999) have called for foundations to act more like investors as opposed to donors, and more specifically to judge their own success in terms of the return they achieve on their grants.

As more and more foundations adopt the "grants-as-investments" mind-set, the call for outcome evaluation reverberates more loudly. Foundation boards are beginning to enlarge the concept of "fiduciary responsibility" to include not only the IRS requirement that grants be spent on charitable purposes, but also the economist's imperative to maximize the expected benefits of an expenditure (albeit, in keeping with the foundation's mission and risk tolerance). To satisfy these new demands, grantees face more and more pressure to document that they have produced a discernible impact with the foundation's dollars. As such, nonprofit organizations are now devoting considerable resources to building their capacity to design and carry out program evaluation, or at a minimum to work productively with outside evaluators. In a very real sense, grantees have come to regard outcome evaluation as a yardstick for proving their worth to funders.

As evaluation becomes more and more a de rigueur feature of the grant agreement, it is important to maintain a clear sense of the practical limitations that arise when outcome evaluation is applied in the nonprofit sector. First and foremost, few (if any) grantee organizations are in a position to perform the level of evaluation that would provide a funder with a qualitative assessment of the "value" of its grant. Even with all the funding, coaching, workbooks, and training that foundations have made available over the past five years, nonprofits have not "learned" to conduct rigorous program evaluation (i.e., evaluation that shows the precise effect a program has on its clients). Rather, most evaluations conducted in the nonprofit sector strive simply to track program participants over time on key indicators. Few make use of control groups (i.e., experimental designs) or even good comparison groups (i.e., quasi-experimental designs). Thus, we generally don't know what would have happened to program participants in the absence of the program. Evaluations are even less definitive for prevention programs (e.g., a teen-pregnancy curriculum for middle-school girls), because there is a time lag between the intervention and the target behavior. Programs that "target" the entire community are even more complicated to evaluate because of uncertainty about where to look for effects and whom to include in the sample.

Given all the limitations associated with conducting rigorous outcome evaluation in the nonprofit sector, should foundations encourage this practice — by raising the standards for grantees, by providing additional resources for evaluation, or by hiring an outside evaluator to conduct more rigorous studies? On the one hand, having “hard” outcome data might allow foundations to compute the return on their grantmaking strategy and to direct future grants to the most promising prospects (just as finance committees rely on investor-performance measures to determine where to invest the foundation's assets). On the other hand, rigorous outcome evaluation requires an approach and an ethic that may inhibit the healthy development and implementation of programs within the nonprofit sector.

Two of the most critical challenges associated with rigorous evaluation are standardizing the intervention and sharing the evaluation results. If an evaluation is to maintain its internal validity, the intervention must remain constant over time and the evaluator must not discuss findings with the program implementers while the study is still ongoing. Obviously, foundations can impose both of these conditions upon their grantees. For example, the grantee might be required to carry out the program exactly as it was described in the grant proposal. However, this approach places the foundation in much more of a policing role, while also discouraging organizational learning and ongoing program improvement on the part of the grantee.

In practice, most grantees seem to be best served by an approach to evaluation that stresses formative evaluation over rigorous outcome evaluation. Formative evaluation involves clearly specifying the desired outcomes of the program, along with fleshing out the assumptions and theories underlying the program (using either a logic model or a theory of change). Measurement focuses on the process of program implementation and on those outcomes that are expected to occur “early” in the change process. These data allow for an empirical test of whether the program is working as intended, and also suggest areas for improving either the design or the implementation of the program model. As the program begins to stabilize (i.e., refinements are fewer and farther between), the evaluation can shift to more of an outcome strategy. The question of whether or not to then apply a vigorous evaluation design to the program will depend on how compelling the early effects appear to be and the potential for dissemination to other sites.

In sum, foundations need to be cautious in proffering the rubric and tools of outcome evaluation, particularly when it comes to supporting their own desire to appraise the value of their grantmaking. Although all organizations can benefit from incorporating some degree of evaluation into their programming, this exercise can prove detrimental if taken to the extreme. Evaluating the precise effect of a program requires not only a significant outlay of resources (e.g., funding, staff time, client time) but also a shift in the organization's focus, and even in its fundamental mission — from creating change to creating knowledge. Likewise, a foundation that focuses too much attention on measuring the value of its grants may be less effective in actually adding value to the work of its grantees.

Postscript: Outcome Evaluation and the Arts

In correspondence with Reader editors as this piece was prepared for print, Doug Easterling wrote:

I am very interested in community-based arts projects, particularly as they work to improve overall community health and well-being. I'm currently involved in the planning process of a Wallace-funded initiative (“Communities in the Know”) that looks to the arts as one vehicle for promoting learning among elementary and middle school youth. Also, my wife is an artist (jeweler/metalsmith/design teacher), so I have a personal interest in all this.

Then, we asked him, “Do you think there are any pertinent considerations around outcome evaluation that are specific to the arts/culture field?” He responded:

The main issue I see is that the really important outcomes that arts programs strive for tend to be the most difficult to measure in a meaningful way. (Health-care programs are much easier to evaluate, except when they are preventive in nature.) Because of the drive for numbers, it seems that most arts programs report on things like attendance and membership, which don't yield a complete picture of how the people who participate are affected. Even though I argued against highly rigorous (i.e., scientific) approaches to outcome evaluation in the article, I think that arts programs can do a much better job in capturing the increased awareness and the new ways of “seeing” that result from effective programs. But these “data” will be descriptive and qualitative (e.g., stories), and thus unlikely to translate into the dollar equivalents that hard-nosed funders are hungry for.

Doug Easterling is director, Division for Community-based Evaluation, Center for the Study of Social Issues, University of North Carolina at Greensboro.