Andy Grogan-Kaylor

Measure for Measure Series – More is Not Always Better: When calculating impact, the sample size may be smaller than you think

Editor’s note: Measure for Measure is a NextBillion series focusing on trends, tools and viewpoints in impact measurement. Check out The Big Idea page on NextBillion for additional posts in the series.

In the course of managing several recent impact assessment workshops, my colleagues (see their posts on NextBillion here and here) and I have consistently fielded versions of this question: How many people should I include in my assessment sample?

This is a central issue for any organization thinking about doing impact assessment. Very large samples may be too expensive for your organization to collect. Overly small samples may fail to provide you with a sufficient level of confidence that any impacts that you find are attributable to anything more than a chance difference (i.e. statistical significance).

Consider the following:

  1. First, think about the level of statistical significance that you seek. For example, .05 is a common threshold value for statistical significance.
  2. Second, think about the desired statistical power, the probability that you will detect a result if there is indeed an impact; 90 percent is a common threshold for statistical power.
  3. Lastly, think about the size of the effect you seek to document. For continuous indicators, effect sizes are commonly expressed in standard deviation units, and it is worth remembering that when subject to rigorous quantitative evaluation, effects are often smaller than our experiences or stories might lead us to believe. Effects in the range of .2 and .3 are considered small, but small effects are common, especially when a particular program is being quantitatively evaluated for the first time.

More stringent criteria for statistical significance, a higher desired level of statistical power, and smaller affect sizes all contribute to a need for larger samples. In a recent impact assessment workshop with African organizations in Johannesburg, South Africa, we discussed these ideas. Under reasonable assumptions about all of these quantities, an organization often finds that an evaluation study with a sufficient sample size requires somewhere in the neighborhood of 200 or so individuals in a treatment group, and 200 or so individuals in a control group.

In thinking about these considerations, it is worth remembering that statistical significance and substantive significance are two different quantities. Not everything that is statistically significant is actually a practically meaningful change.

Andy Grogan-Kaylor is an associate professor at the University of Michigan School of Social Work, and a research design and statistical methodologist at the Vivian A. and James L. Curtis School of Social Work Research and Training Center.

Impact Assessment