Even in cases where it can be convincingly argued that the causal impact of an intervention can be identified, the output of the evaluation will be an estimated impact and it is critical to characterise the uncertainty surrounding such estimates.
In many contexts, this can be difficult. One important area is where the data have a multi-level structure, and where there is potential serial correlation in treatment and in group level shocks. The first of these is typical of “difference-in-difference” designs, which is perhaps the most widely use quasi-experimental design for programme evaluation.
Bertrand et al., (2004) highlighted the significant inference problems that can arise in difference-in-difference designs. In particular, they show that this issue becomes more challenging if the number of “groups” are relatively small, a situation often encountered in real data. While some progress has been made on these problems, particularly with bootstrap methods (eg., Cameron et al, 2008; see also chapter 8 of Angrist and Pischke, 2009, and Donald and Lang, 2007), unsolved problems remain. In particular, all of the existing solutions have been demonstrated to be effective only when the number of groups is large. Unsurprisingly, therefore, many recent policy evaluations using difference-in-difference and conducted in the UK have not fully addressed these concerns about inference.
The objectives of Project 2, “Improving Inference for Policy Evaluation”, is to develop methods for inference in programme evaluations, and to disseminate these methods and related best practice in the area of inference to social scientists undertaking programme evaluations.