Troubled Families: How experimenting could teach us “what works?”

Jason Lowther

 

In this blog on 3rd Feb, I explored the formal Troubled Families Programme (TFP) evaluation and looked at the lessons we can learn in terms of the timing and data quality issues involved. This week I want to consider how designing a more experimental approach into this and future programmes could yield lots more insight into what works where.

The idea of an “experimental” approach to policy and practice echoes enlightenment period thinkers such as Francis Bacon (1561—1626), who promoted an empirical system built on careful experimentation. Donald Campbell’s ideas[1] on ‘reforms as experiments’ argued that social reforms should be routinely linked to rigorous experimental evaluation. ‘Social engineering’ built on ‘social experiments’ became a popular concept in the USA and social science.

Social experiments in America included work in response to a concern that providing even modest income subsidies to the poor would reduce motivation to find and keep jobs. Rossi and Lyall (1976) showed that work disincentives were in fact less than anticipated. In the field of prison rehabilitation, Langley et al. (1972) tested whether group therapy reduced re-offending rates. The results suggested that this approach to group therapy did not affect re-offending rates.

Unfortunately, meaningful experiments proved more difficult than anticipated to deliver in the field, and even robust experiments were often ignored by policy makers. As a result, until recently this experimental approach fell out of favour in social policy, except in the field of medicine.

The term ‘evidence-based medicine’ appears to have been first used by investigators from a US university in the 1990s where it was defined as ‘a systemic approach to analyze published research as the basis of clinical decision making.’ The evidence-based medicine movement considered experiments – specifically, collections of Randomised Controlled Trials (RCTs) subject to systematic reviews – as the “gold standard” of proof of whether interventions “work” or not.

Randomised controlled trials are sometimes not easy to undertake in social policy environments, but they can be done and they can provide surprising results. Starting in 2007, Birmingham City Council evaluated three evidence-based programmes in regular children’s services systems using RCTs[2]. We found that one programme (Incredible Years) yielded reductions in negative parenting behaviours among parents, reductions in child behaviour problems, and improvements in children’s relationships; whereas another (Triple-P) had no significant effects.

What was interesting for practitioners was that the children in all the trials had experienced improvements in their conduct. Only by use of a formal “control” group were we able to see that these “untreated” children were also improving, and so we were able to separate out the additional impacts of the intervention programmes.

There are a number of lessons from this and other past experience that can help practitioners wanting to deliver robust trials to test whether innovations are working (or not). The most important point is: build the evaluation testing into the design of the programme. The Troubled Families Programme could have built an RCT into the rollout of the programme – for example, selecting first year cases randomly from the list of families who were identified as eligible for the scheme. Or introducing the scheme in some council areas a year earlier than others. Or councils could have done this themselves by gradually rolling out the approach in different area teams.

Sandra Nutley and Peter Homel’s review[3] of the New Labour government’s Crime Reduction Programme stressed the importance of balancing the tensions between fidelity to “evidence based” policy (to maximise the chance of impact) and innovation (to ensure relevance to the local context), short-term wins and long-term learning, and evaluator independence (to ensure rigour) versus engagement (to help delivery).

In my final blog on the TFP next time, I explore the potential for “theory-based” approaches to evaluation helping us to understand “what works and why?” in this and other policy areas.

Campbell, D. T. and Russo, M. J. (1999) Social experimentation. Sage Publications, Inc.

Langley, M., Kassebaum, G., Ward, D. A. and Wilner, D. M. 1972. Prison Treatment and Parole Survival. JSTOR.

Nutley, S. and Homel, P. (2006) ‘Delivering evidence-based policy and practice: Lessons from the implementation of the UK Crime Reduction Programme’, Evidence & Policy: A Journal of Research, Debate and Practice, 2(1), pp. 5-26.

Rossi, P. H. and Lyall, K. (1976) ‘Reforming public welfare’, New York: Russell Sage.

Sanderson, I. (2002) ‘Evaluation, policy learning and evidence‐based policy making’, Public administration, 80(1), pp. 1-22.

White, M. (1999) ‘Evaluating the effectiveness of welfare-to-work: learning from cross-national evidence’, Evaluating Welfare to Work. Report, 67.

[1] Campbell, D. T. and Russo, M. J. (1999) Social experimentation. Sage Publications, Inc.

[2] Little, Michael, et al. “The impact of three evidence-based programmes delivered in public systems in Birmingham, UK.” International Journal of Conflict and Violence (IJCV) 6.2 (2012): 260-272.

[3] Nutley, S. and Homel, P. (2006) ‘Delivering evidence-based policy and practice: Lessons from the implementation of the UK Crime Reduction Programme’, Evidence & Policy: A Journal of Research, Debate and Practice, 2(1), pp. 5-26.

 

 

lowther-jason

 

Jason Lowther is a senior fellow at INLOGOV. His research focuses on public service reform and the use of “evidence” by public agencies.  Previously he led Birmingham City Council’s corporate strategy function, worked for the Audit Commission as national value for money lead, for HSBC in credit and risk management, and for the Metropolitan Police as an internal management consultant. He tweets as @jasonlowther

Troubled Families: Two Secrets to Great Evaluations

Jason Lowther

In this blog last week I explored the (rather flimsy) evidence base available to the developers of the original Troubled Families Programme (TFP) and the potential for “theory of change” approaches to provide useful insights in developing future policy. This week I return to the formal TFP evaluation and look at the lessons we can learn in terms of the timing and data quality issues involved.

The first secret of great evaluation: timing

The experience of the last Labour Government is very instructive here. New Labour appeared as strong advocates of evidence-based policy making, and in particular were committed to extensive use of policy evaluation. Evaluated pilots were completed across a wide range including policies relating to welfare, early years, employment, health and crime. This included summative evaluations of their outcomes and formative evaluations whilst the pilots were underway, attempting to answer the questions “Does this work?” and “How does this work best?”

Ian Sanderson provided a useful overview of Labour’s experience at the end of its first five years in power[i]. He found that one of the critical issues in producing great evaluations (as for great comedy), is timing. Particularly for complex and deep-rooted issues (such as troubled families), it can take a significant time for even the best programmes to have an impact. We now know the (median) time a family remained on the TFP programme was around 15 months.

It can also take significant time for projects to reach the “steady state” conditions, which they would work under when fully implemented. Testing whether there are significant effects can require long-term, in-depth analysis. This doesn’t fit well with the agenda of politicians or managers looking to learn quickly and sometimes to prove a point.

Nutley and Homel’s review[ii] of lessons from New Labour’s Crime Reduction Programme found that “projects generally ran for 12 months and they were just starting to get into their stride when the projects and their evaluations came to an end” (p.19).

In the case of the Troubled Families Programme, the programme started in April 2012, and most of the national data used in the evaluation relates to the 2013-14 financial year. Data on exclusions covered only those starting in the first three months of the programme, whereas data on offending, benefits and employment covered families starting in the first ten months of roll-out.

We know that 70% of the families were still part-way through their engagement with the TFP when their “outcomes” were counted, and around half were still engaged six months later.

It’s now accepted by DCLG that the formal evaluation was run too quickly and for too short a time. There just wasn’t time to demonstrate significant impacts on many outcomes.

The second secret: data quality

Another major element of effective evaluation is the availability of reliable data. Here the independent evaluation had an incredibly difficult job to do. The progress they have made is impressive – for the first time matching a wide range of national data sets, local intelligence and qualitative surveys. But at the end of the day the data quality base of the evaluation is in places poor.

The evaluation couldn’t access data on anti-social behaviour from national data sets, as this is not recorded by the police. This is unfortunate given that the strongest evidence on the effectiveness of TFP-like (Family Intervention) programmes in the past concerns reducing crime and anti-social behaviour[iii].

A chunk of data came from the 152 local authorities. This data was more up to date (October 2015), although only 56 of the councils provided data – which enabled matching to around one quarter of TFP families. The evaluation report acknowledges that this data was “of variable quality”. For example, the spread of academy schools without a duty to co-operate meant there are significant gaps in school attendance data. This will be a serious problem for future evaluations unless academies’ engagement with the wider public service system is assured.

In summary, the TFP evaluation covered too short a period and, despite heroic efforts by DCLG and the evaluators, was based on data of very variable quality and completeness.

Next time we will explore the “impact” evaluation in more detail – looking at how designing a more experimental approach into this and future programmes could yield more robust evaluation conclusions of what works where.

[i] Sanderson, Ian. “Evaluation, policy learning and evidence‐based policy making.” Public administration 80.1 (2002): 1-22.

[ii] Nutley, Sandra, and Peter Homel. “Delivering evidence-based policy and practice: Lessons from the implementation of the UK Crime Reduction Programme.” Evidence & Policy: A Journal of Research, Debate and Practice 2.1 (2006): 5-26.

[iii] DfE, “Monitoring and evaluation of family intervention services and projects between February 2007 and March 2011”, 2011, available at: https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/184031/DFE-RR174.pdf

 

 

lowther-jason

 

Jason Lowther is a senior fellow at INLOGOV. His research focuses on public service reform and the use of “evidence” by public agencies.  Previously he led Birmingham City Council’s corporate strategy function, worked for the Audit Commission as national value for money lead, for HSBC in credit and risk management, and for the Metropolitan Police as an internal management consultant. He tweets as @jasonlowther

 

So: does the Troubled Families Programme work or not? – Part Two

Jason Lowther

In this blog last week I outlined results of the “impact evaluation” element of the Troubled Families Programme (TFP) and the rather limited pre-existing evidence base the TFP had to be built upon. How can government build on existing evidence in designing its initiatives, and what can we do when there isn’t much in the evidence cupboard?

Many government programmes have the luxury of a relatively strong evidence base on which to build. The previous Labour government’s National Strategy for Neighbourhood Renewal and Sure Start programmes, for instance, could draw on decades of research (collated through the 18 Policy Action Teams) on urban initiatives and the impact of early years experiences on achievements in later life. These sometimes honoured the extant evidence more in the theory than in practice[i], but at least they had foundations on which to build.

As evaluations of the Labour government’s Crime Reduction Programme found[ii], it is a difficult task to translate evidence, which is often “fragmented and inconclusive” into practical government programmes. People skilled at this task are in short supply in central government.

But in the case of the TFP, the most robust element of the existing evidence base was a single evaluation using a “control” of 54 families and focussed on addressing anti-social behaviour through Family Intervention Projects. What can government do when the evidence base is thin?

One strong tradition, particularly around medicine and around welfare policies in the USA, has been the idea of “experimental government” using social experiments to determine whether (and if so how) innovative approaches work in practice. For example, in the last three decades of the 20th century, America’s Manpower Demonstration Research Corporation (MDRC) conducted 30 major random assignment experiments involving nearly 300,000 people.

Historically, randomised controlled trials (RCTs) were viewed by many as the “gold standard” of evaluation by allowing statistically robust assessments of “causality” – whether observed changes are due to the intervention being evaluated. More recent thinking emphasises that evaluations need to be designed in the best way to create robust evidence and answer specific questions. Often this will involve a mixture of methods, both quantitative and qualitative. The TFP evaluation used a mixture of methods but without building in a “control” group of “troubled families” not yet receiving the TFP interventions.

Granger[iii] argued (for area based initiatives), that the range and variety of initiatives and the scale of change in government means that a strict statistical “control” is unfeasible. She argued that it is “virtually impossible” to achieve precise and clear-cut causal attribution and that we need clear, strong theories as a basis for counterfactual reasoning and causal inference.

The TFP evaluation did not develop or test a “theory of change” for the programme. This is a pity, because rigorously testing a theory can help illuminate where and how programmes do (or don’t) have real impact.

There are several other lessons we can learn from the existing literature on evaluation in government, for example the importance of timing and data quality. We’ll look at these next time.

[i] Coote, Anna, Jessica Allen, and David Woodhead. “Finding out what works.” Building knowledge about complex, community-based initiatives. London: Kings Fund (2004), esp. pp. 17-18.

[ii] Nutley, Sandra, and Peter Homel. “Delivering evidence-based policy and practice: Lessons from the implementation of the UK Crime Reduction Programme.” Evidence & Policy: A Journal of Research, Debate and Practice 2.1 (2006): 5-26.

[iii] Granger, R. C. (1998) ‘Establishing causality in evaluations of comprehensive community initiatives’, New approaches to evaluating community initiatives, 2, pp. 221-46.
lowther-jason

Jason Lowther is a senior fellow at INLOGOV. His research focuses on public service reform and the use of “evidence” by public agencies.  Previously he led Birmingham City Council’s corporate strategy function, worked for the Audit Commission as national value for money lead, for HSBC in credit and risk management, and for the Metropolitan Police as an internal management consultant. He tweets as @jasonlowther

HS2: the importance of evidence

Rebecca O’Neill

Large infrastructure planning projects are often met with much controversy and debate. This is partly due to the risks involved and the conflicting views amongst actors. One such project is the proposed high-speed railway to London from Birmingham, the North of England and potentially Scotland; better known as High Speed Two (HS2). After the project received an amber-red rating in May from the Major Projects Authority (MPA) annual report there is every reason for people to be concerned. An amber-red project is defined as follows:

Successful delivery of the project is in doubt, with major risks or issues apparent in a number of key areas. Urgent action is needed to ensure these are addressed, and whether resolution is feasible’.

So the questions that must be asked are what evidence supports the project and how should we analyse the debate? The evidence in favour of the project is largely based on predictive models and statistical data. One would think that after the financial crisis of 2008, people would not be so quick to base decisions on rational predictive models. Or that after the cost overruns and benefit shortfalls of HS1 (the Channel Tunnel Rail Link) supporters of HS2 would be less optimistic in their forecasts. However, advocates of the project believe that the project is both viable and necessary to tackle over-capacity issues on the West Coast Main Line.

There are a number of ways of analysing the debate. One such way is through an evidence-based policy making lens. This approach argues that once a policy problem is identified then research evidence will fill the knowledge gap thus solving the problem. For advocates of evidence-based policy making, ‘the task of the researcher is to make accurate observations about objective reality, ensuring that error and bias are eliminated by isolating variables in order to be able to identify cause-effect relationships’. These experimental methods are usually in the form of statistical analysis and they rely heavily on quantitative data. So evidence must be about ‘facts’ that tend to prove or disprove a conclusion. Evidence-based policy making has underlying positivist assumptions that it is possible to have a value-free science. It assumes that there is an objective truth ‘out there’ and if researchers adopt a certain approach then they will find the answer to the wicked issues and social problems we are facing.

If we utilise the evidence-based policy making approach then I must come to these conclusions:

  • The actors within HS2 are rational actors who have systematically collected scientific, rigorous evidence to support their claims and their decisions are rational and value-free.
  • If there is a conflict of evidence then this is either because the actors have not behaved rationally, they have allowed emotions and values to shadow their decisions or the evidence has flaws in terms of quality and methodology.
  • Those opposed to the project have an argument based on ideologies and less systematic and rigorous evidence.

However, I propose (along with many others) that the policy process is messy, that actors are rarely rational, that evidence is not necessarily ‘out there’ waiting to be found and that assuming more information will provide policy makers with the solution is wrong. The policy process is better viewed as an arena in which actors present claims and attempt to persuade their audience that these claims are true through the presentation of evidence and persuasion. The claims made by actors within the process are based on a variety of different evidence ranging from personal opinion to rigorous, scientific evidence. A good claims-maker will have mastered the art of appealing to a range of audiences, shaping and presenting their evidence in a way that best suits their audience. The concept of evidence-based policy making does not acknowledge the role of humans in this sense.

In the case of HS2, claims were made about the West Coast main line (WCML) stating that it was almost at full capacity as well as claiming that the UK needed to modernise its railway infrastructure. They did not simply claim that it was the right thing to do; rather they captured existing discourses within society such as modernisation and economic growth. The claims-making framework enables us to explain why unfounded anecdotes can easily override rigorous scientific effort and investment. It also explains why some evidence is accepted over other evidence.

For a long time supporters of the project dismissed counter-claims and evidence arguing that the NIMBYs were being selfish, that the project was for the greater good and that they were preventing much needed modernisation. However, more and more people are questioning the claims being presented by HS2 Limited and their followers.  In practice, the philosophy of ‘what works’ often takes second place to, as Russell and Greenlagh describe, ‘experiential evidence, much of which was in the form of anecdotes or generalisations based on a person’s accumulated wisdom about the topic’. Claims-making theory, therefore, provides a robust theoretical framework for examining the process of how claims are made, received, denied through counter claims, and reshaped. It also illustrates how claims and those who make them interact to formulate public policy.

o'neill

Rebecca O’Neill is a doctoral student looking at the role of evidence within High Speed Two. She has an interest in the conceptualisation of evidence, evidence-based policy making, the claims-making framework and interpretive approaches to research.