Version 0.2 Published November 2016
This document describes the standard for conducting an impact audit, a short-term engagement with a nonprofit with two objectives:
- Help nonprofits: Guide nonprofits to strengthen their use and production of appropriate evidence in order to deliver more effective programs. Success is measured by whether nonprofits shift toward interventions with greater impact and evidence, as appropriate.
- Help donors: Rate the impact of nonprofits for donors to provide guidance on which nonprofits credibly advance their mission. Success is measured by whether donations shift to nonprofits that generate greater impact, and have greater evidence to support that impact.
Each impact audit produces two deliverables:
- If the nonprofit agrees to publication, the nonprofit is issued an impact audit report rating the nonprofit on Quality of Impact Evidence, Cost of Impact, Quality of Monitoring Systems and Learning and Iteration. The rating is accompanied by additional information and analysis to help donors evaluate the nonprofit.
- The impact audit team produces a private management letter with recommendations on how the nonprofit can improve its intervention and operations. The letter includes relevant resources and summaries of the scientific literature.
This standard is written for single-program nonprofits with a charitable mission that are directly delivering an intervention to their beneficiaries. This standard applies to nonprofits across sectors, although its specific implementation will vary by sector. We also recommend this standard for use by funders who are assessing nonprofits for funding.
We welcome your feedback here.
To illustrate how the rating scheme is applied in real-world contexts, we provide vignettes of fictional nonprofits (labeled A-H) that are working to increase childhood vaccination rates. The vignettes illustrate the ratings for nonprofits of different quality. Nonprofits A and B run a television campaign to disseminate information on the importance of vaccination. Nonprofits C and D run informational sessions for new mothers. Nonprofits E-H run informational sessions for new mothers and provide a financial reward to the mother if her child completes the full vaccination sequence.
Quality of Impact Evidence
Why we rate
Quality of Impact Evidence captures how confident we are that the nonprofit’s program is leading to impact on its primary outcomes. Nonprofits with high-quality evidence of impact are delivering an intervention with proven impact. Nonprofits delivering interventions with lower quality of evidence may still be achieving impact, but there is less proof to substantiate that impact.
Quality of Impact Evidence informs how donors assess Cost of Impact; nonprofits with higher-quality evidence have more reliable impact figures.
How we rate
Evidence can either come from internal data collected directly on the nonprofit’s program (“internal evidence”) or from data collected on interventions similar to the nonprofit’s program (“external evidence”). When considering internal evidence, we look at the quality of that evidence in substantiating the impact of the program. We also consider the relevance of that evidence to the nonprofit’s current program. For instance, evidence that was collected on a previous iteration of the nonprofit’s model will be scored as less relevant to the current intervention.
For internal evidence, we consider both internal evaluations produced by the nonprofit and independent evaluations of the nonprofit’s program.
When considering external evidence, we look for studies from elsewhere that were conducted on a similar intervention and have a similar theory of change. We evaluate both the quality of the evidence and the relevance of that evidence to the nonprofit’s program.
When rating Quality of Impact Evidence, we first classify the nonprofit by program stage. The program stage determines what criteria the nonprofit is assessed against.
|Design||The program model is undergoing change.|
|Validation||The nonprofit is testing the program’s impact.|
|Scaling||The nonprofit is in the process of expanding its program.|
For direct service delivery interventions, we generally rate the quality of evidence as follows:
|Evidence Quality||Types of Evidence|
|Low||A poorly conducted observational or experimental analysis.|
|Medium||An experimental or quasi-experimental analysis with a strong counterfactual that may have limitations that bias the results.|
A well-conducted observational analysis that shows large, consistent effects on outcomes without a strong counterfactual.
|High||A well-conducted experimental or quasi-experimental analysis with a strong counterfactual that does not have limitations that may bias the results.|
Analyses that rely only on anecdote generally do not provide evidence of impact. For interventions where it is not possible to collect counterfactual evidence, we rate the quality of evidence based on what evidence is feasible to collect.
For direct service delivery interventions, we generally rate the relevance of evidence by considering the intervention complexity, design and fidelity as well as the geographic setting, population reached and any other contextual factors identified as relevant to the intervention’s theory of change.
Nonprofits in the design stage
No logical case (in terms of theory and applicable empirical data)
Replication of an ineffective intervention
Nonprofit A is designing a program with a television ad encouraging mothers to vaccinate their children. Although the vaccination sequence has strong evidence of efficacy, this particular intervention has been studied elsewhere extensively and those studies found no impact on vaccination. There is no case for why this program may work given it has not worked elsewhere.
No logical case (in terms of theory and applicable empirical data)
Nonprofit B is just like Nonprofit A, but this intervention has not yet been tested elsewhere. However, although vaccines have strong evidence of efficacy and are accepted as best practice, there are substantial reasons to doubt Nonprofit B’s intervention will be successful. The penetration of televisions is low and there is a strong correlation between households with televisions and those whose children are already vaccinated.
Weak logical case (in terms of theory and applicable empirical data)
Nonprofit C is designing a program with an informational session on the importance of completing vaccination sequences with new mothers at health clinics. The vaccines have strong evidence of efficacy and are accepted in the medical community as best practice. However, evidence suggests that most mothers, when asked, report knowing that vaccinations are important; thus, it is not obvious that information is the key constraint driving low completion rates for vaccination. In this area it may be that mothers are not well informed, but Nonprofit C has not collected the necessary data to establish that this is the case.
Good logical case (in terms of theory and applicable empirical data)
Nonprofit D is just like Nonprofit C, except they have done strong diagnostic work to establish that indeed most mothers do not seem to have good information about the importance of vaccines. However, evidence from elsewhere is mixed on the efficacy of information sessions.
Strong logical case (in terms of theory and applicable empirical data)
Nonprofit E is designing a program with an informational session and financial incentives to complete immunization sequences. This intervention is similar to a program that uses non-financial incentives and was shown in a randomized trial to increase vaccination. Since incentives are less prone to delivery quality, the evidence from elsewhere is of high relevance.
Nonprofits in the validation stage
|External Evidence||Internal Evidence|
|Not producing||Producing low-quality||Producing medium-quality||Producing high-quality|
Producing no internal evidence and has no external evidence
Nonprofit A is validating a program with a television ad on the importance of vaccination. It is not producing data on changes in vaccination rates. There is no external evidence.
Producing low-quality internal evidence and has no external evidence
Nonprofit B is just like Nonprofit A, but Nonprofit B conducts periodic before and after surveys on vaccination rates in targeted communities. However, there is little credible reason to attribute any changes in vaccination rates to the intervention. There is no external evidence.
Producing no internal evidence and has low-applicability external evidence
Nonprofit C is validating a program with an informational session on the importance of completing vaccination sequences. Nonprofit C is collecting no data on vaccination rates or behavior change. However, a study in South Africa suggests a similar pamphlet increased vaccination rates. The study is of medium quality but the messaging of the pamphlet and target population are sufficiently different that the evidence only has low relevance.
Producing medium-quality internal evidence
Nonprofit D is just like Nonprofit C, except it is conducting a study comparing vaccination rates in intervention districts with matched districts. The study is of medium quality.
Producing low-quality internal evidence and has medium-quality external evidence
Nonprofit E is validating a program with an informational session and financial incentives to complete immunization sequences. Nonprofit E is conducting a low-quality before and after study to measure vaccination rates in its district. Two quasi-experimental studies (with some flaws that may bias the results) on similar programs from elsewhere found strong impact on vaccination.
Producing high-quality internal evidence
Nonprofit F is like Nonprofit E, but is conducting a well-designed experiment to test impact.
Producing medium-quality internal evidence and has high-quality external evidence
Nonprofit G is like Nonprofit E, but is conducting a medium-quality study. Two rigorous studies from similar countries suggest presentations and cash incentives increase vaccination rates.
Nonprofits in the scale stage
|External Evidence||Internal Evidence|
|None||Has low-quality||Has medium-quality||Has high-quality|
Has no internal or external evidence
Nonprofit A is scaling a program with a television ad on the importance of vaccination. Nonprofit A has no internal evidence to demonstrate its television ad leads to attributable change in vaccination rates. There is no external evidence on the intervention.
Has low-quality internal evidence and no external evidence
Nonprofit B is just like Nonprofit A, but has conducted periodic before and after surveys on vaccination rates. There is no external evidence.
Has no internal evidence and low-applicability external evidence
Nonprofit C is scaling a program with an informational session on the importance of completing vaccination sequences with new mothers at health clinics. Nonprofit C has produced no internal evidence. However, a study in South Africa suggests a similar pamphlet increased vaccination rates. The study is of medium quality but the messaging of the pamphlet and target population are sufficiently different that the evidence only has low relevance.
Has medium-quality internal evidence
Nonprofit D is just like Nonprofit C, except it has conducted a quasi-experimental study on the program. The study compared vaccination rates in intervention districts with matched districts. The study was of medium quality and found a positive, statistically significant impact of the program.
Has low-quality internal evidence and medium-applicability external evidence
Nonprofit E is scaling a program with an informational session and financial incentives to complete immunization sequences. Nonprofit E conducted a low-quality before and after study to measure vaccination rates in its district that found a positive, statistically impact of the program. Two quasi-experimental studies (with some flaws that may bias the results) have been conducted on similar programs elsewhere and found strong, statistically significant impact on vaccination rates.
Has high-quality internal evidence
Nonprofit F is like Nonprofit E, but has conducted a well-designed experimental study that found a positive, statistically significant impact of its intervention on vaccination rates.
Has medium-quality internal evidence and high-applicability external evidence
Nonprofit G is like Nonprofit E, but has conducted a quasi-experimental study that found increases in vaccination day attendance following the implementation compared to matched districts. Two rigorous, randomized trials from similar countries found a very similar program leads to a positive, statistically significant impact on vaccination rates.
Cost of Impact
Why we rate
We report the best available estimates of average cost and impact on outcomes, giving a donor perspective on what a donation to the organization could achieve. Cost of Impact analyses always rely on substantial judgments and imperfect data. All estimates are imprecise, but give the donor a general view of the impact of a dollar.
How we rate
We calculate and report the estimated average impact per beneficiary and the average total cost per beneficiary. These figures are best estimates based on the available internal and external data. Cost of Impact is not rated, but is reported along with program stage, geography of delivery and other contextual factors to guide donors. We also provide analysis on how to interpret these figures.
We report average cost to deliver programs, including all fundraising and management costs, as well as cost incurred from small programs that do not directly target the nonprofit’s primary outcomes. Our objective is to provide donors with an estimate of the total cost to achieve the outcomes the nonprofit holds itself accountable to and average cost provides a better estimate of this than marginal cost. We include costs to participants and other organizations contributing to the implementation. Attempting to adjust these figures can also introduce bias, so we prefer to report average costs to make estimates as comparable as possible between groups.
We typically report one outcome per program. However, if a program is credibly affecting multiple outcomes independently, we report multiple outcomes. We estimate total lifetime benefits of the program, discounting future benefits at a 5% rate. For programs where theory suggests benefits extend longer than what has been measured in studies, we conservatively extend the length of benefits. We base impact estimates on sources available in this order:
- Studies conducted directly on the nonprofit’s program
- Studies or meta-analyses of the intervention from elsewhere
- If none available, we construct a model based on assumptions or do not report
We attempt to report outcome metrics in a standardized way. For different nonprofits targeting the same outcomes, we use common assumptions as much as possible to increase comparability. We include costs to beneficiaries and costs borne by others to deliver the program. Finally, we report displacement, externalities and other contextual factors that could influence the impact of the program. For more information, see “How we Assess Nonprofit Cost of Impact” (link forthcoming).
Quality of Monitoring Systems
Why we rate
Quality of Monitoring Systems captures how well the nonprofit produces and uses data to ensure it is consistently delivering its program at high quality. Nonprofits with strong monitoring systems can credibly show that their programs are reaching the claimed number of beneficiaries, and that nonprofit staff have the data and systems to identify problems in implementation and take action to correct those problems. Furthermore, appropriate monitoring data demonstrate that the nonprofit is continuing to achieve impact, even when it is not directly evaluating outcomes with a counterfactual.
How we rate
We rate the quality of the following systems used to monitor delivery of the intervention:
- Activity: track program activities and outputs delivered
- Targeting: identify beneficiaries to receive the program
- Engagement: track if participants are taking up the program and meeting targets
- Feedback: understand how participants view the program
- Outcomes: measure changes in beneficiary outcomes
Each data system is assessed to determine if it generates data that is credible, actionable, responsible and transportable.
Monitoring systems are credible if they collect high-quality data that are analyzed accurately. To determine if a monitoring system is credible, we look at the following sub-criteria:
Valid: Data capture the essence of what the organization is seeking to measure.
Reliable: The same data collection procedure will produce the same data repeatedly.
Unbiased: Data have no systematic errors.
Monitoring systems are actionable if the nonprofit commits to act on the data it collects. To determine if a monitoring system is credible, we look at the following sub-criteria:
Ready for decision-making: Data are analyzed and reported in an accurate and timely way that meets the needs of staff and decision-makers.
Addresses risks and assumptions: Data enable monitoring of the main risks and testing of the main assumptions in the nonprofit’s theory of change.
Commitment to take action: Staff at all levels systematically review and respond to reported data.
Monitoring systems are responsible if the nonprofit minimizes the burden of data collection and collects data ethically. To determine if a system is responsible, we look at the following sub-criteria:
Minimizes burden: The nonprofit has taken reasonable steps to minimize the burden both on beneficiaries and on the nonprofit’s financial and staff resources.
Ethical: Risks of data disclosure or other misuse have been considered and mitigated.
Monitoring systems are transportable if the data collected are tied to the theory of change and shared appropriately. To determine if a system is transportable, we look at the following sub-criteria:
Fit to theory of change: Monitoring systems are closely tied to the nonprofit’s theory of change.
Transparent: The nonprofit shares information about its monitoring systems and summary results of monitoring data publicly. The nonprofit shares requested information during the impact audit.
We assess and rate the sub-criteria “Yes”, “No” or “Inconclusive”. If all sub-criteria are scored “Yes” or “Inconclusive”, the overall criterion is scored “Yes”. Otherwise, the overall criterion is scored “No”. We rate nonprofits at all stages based on the proportion of “Yes” scores out of all 20 possible “Yes” scores (four criteria applied to five types of monitoring systems). Stars are assigned as follows:
Scored “no” on credible criterion for activity monitoring system
Nonprofit A runs a television ad on the importance of vaccination. Nonprofit A cannot produce accurate information about where and when the ad was shown.
Scored “yes” on credible criterion for targeting monitoring system
Nonprofit B runs a similar program as Nonprofit A. Nonprofit B uses representative household surveys to determine which television shows are most often watched by pregnant mothers from populations that tend to not vaccinate, and uses that information to reserve ad space.
Scored “no” on actionable criterion for engagement monitoring system
Nonprofit C runs a program with an informational session on the importance of completing vaccination sequences. Field staff for Nonprofit C collect detailed information about attendance and participation at these sessions, but this information is never digitized and transmitted to program managers at the nonprofit’s headquarters and so does not inform decision-making.
Scored “yes” on actionable criterion for feedback monitoring system
Nonprofit D runs a similar program as Nonprofit C. Nonprofit D randomly surveys mothers who attend its sessions and collects a survey about attitudes, opinions and knowledge following the session. This information is summarized by program managers and reported quarterly to senior leadership, who review the reports. The curriculum has been modified several times in response to insights provided by these surveys.
Score “no” on responsible criterion for outcomes monitoring system
Nonprofit E runs a program with an informational session and financial incentives to complete immunization sequences. Nonprofit E collects a survey of mother and child outcomes by going door-to-door twelve months after the conclusion of its program. However, instead of taking a representative sample, Nonprofit E interviews every mother who was offered a financial reward.
Score “yes” on responsible criterion for activity monitoring system
Nonprofit F runs a similar program as Nonprofit E. Nonprofit F collects a similar survey, but constructs a representative sample to minimize burden on beneficiaries and Nonprofit F’s staff and financial resources.
Scored “no” on transportable criterion for targeting monitoring system
Nonprofit G runs a similar program as Nonprofit E. Nonprofit G has stated that its intended recipient population is the extreme poor. However, to identify these individuals, the nonprofit uses location data. In the area in which Nonprofit G works, the extreme poor tend to be intermingled with individuals of higher economic status. Therefore, Nonprofit G’s targeting systems are not closely aligned to its theory of change.
Score “yes” on transportable criterion for engagement monitoring system
Nonprofit H runs a similar program as Nonprofit E. Nonprofit H collects information about mother participation in information sessions, interest in financial incentives, and attendance at all vaccination days. This data is closely tied to Nonprofit H’s theory of change. In addition, Nonprofit H shares substantial data publicly about how it monitors engagement and shared all requested information during the impact audit. Nonprofit H has systems for monitoring activities, targeting, engagement, feedback and outcomes, and all data is credible, actionable and responsible.
Learning and Iteration
Why we rate
Learning and Iteration captures how a nonprofit learns about its own model and makes decisions to change its model. We reward nonprofits that are using high-quality data to learn about areas for improvement, and then act on that data to iterate their model systematically and periodically. Such nonprofits are more likely to maintain and increase the impact of their core programs, and are likely to be more resilient and responsive to shocks and other changes within the nonprofit and the environment in which it works.
How we rate
We rate how well the nonprofit uses data to learn what does and does not work, and then appropriately iterates on its model.
We rate nonprofits at all stages against the following criteria:
No iteration or arbitrary iterations in operations or model
Nonprofit A runs a television ad on the importance of vaccination. Nonprofit A has been operating the same ad campaign for several years, without making any efforts to understand how the ad campaign has been received and iterate on the core campaign message or the design of the intervention.
Nonprofit iterates its model based on data that is low quality
Nonprofit B is just like Nonprofit A, but Nonprofit B recently changed the design of its ad. However, it made the decision based on a few anecdotal suggestions from staff. Nonprofit B did not pilot the new ad and collected no data on changes to viewer behavior following the change.
Nonprofit iterates its model based on data that is high quality
Nonprofit C runs a program with an informational session on the importance of completing vaccination sequences. Nonprofit C analyzed data on vaccination rates nationally last year and shifted the geographies targeted by its program to better reach populations with the lowest vaccination rates. However, Nonprofit C does not have systems for routinely re-assessing vaccination rates and re-targeting its operations.
Nonprofit systematically and continuously iterates its model based on data that is high quality
Nonprofit D runs a similar program to Nonprofit C. Nonprofit D holds quarterly sessions to collate learning from monitoring systems, field staff and outside sources and to plan changes to its program. All changes are piloted first on a subset of participants in the program, and data is collected on participant behavior. If the design iteration is significant, Nonprofit D conducts an A/B test to rigorously compare outcomes for participants receiving the new design to those receiving the existing design.