Skip over navigation

Journal Issue: Home Visiting: Recent Program Evaluations Volume 9 Number 1 Spring/Summer 1999

Understanding Evaluations of Home Visitation Programs
Deanna S. Gomby

Interpreting the Results

Methodologically strong evaluations with sensible analytic plans still require interpretation. Factors that may influence how much weight to give to a single evaluation include attrition from the study, the policy and practical relevance of its findings, and the likelihood that its results are generalizable across multiple settings.

Attrition

The articles in this journal issue report that between 20% and 67% of families leave the home visiting programs before their intended completion. Families leave for a variety of reasons, including mobility out of the community, a lack of interest, and, perhaps, a belief that they have already derived as much benefit as they can from the programs.

Such attrition may indicate that program modification is needed to increase family engagement, but it is also a sign that the evaluation could be weakened. If those who remain in the program are somehow different from those who have dropped out (perhaps because they are more motivated to seek improvement), then an evaluation that assesses only those families that remain in the program may overestimate the program’s benefits. Those persevering families might well have benefitted without home visiting services, because they might have sought out other community services or resources on their own.

Most methodologists believe that the most appropriate way to assess a program in the face of attrition is by measuring all the families, whether or not they receive the intended services, keeping them in their originally assigned groups.1 This is often called the intention-to-treat approach. It is expensive because it requires that evaluators locate families that have moved away from the area, and that they pursue individuals who may prefer to be left alone.

This approach was attempted in most of the studies reported on in this journal issue, but not all of the studies were able to locate all of the individuals who disappeared from the program. Attrition from the evaluations ranged from 11% to 48%, hovering around 20% for most. In most cases, the program evaluators sought to demonstrate that the smaller groups were initially equivalent to the larger groups in terms of background characteristics such as age, ethnicity, income, and education, but there is no way to tell whether the groups differed on intangibles such as motivation or drive.

Statistical Significance Versus Policy Relevance

Winning the battles of statistical significance, research rigor, and family engagement is not the same as winning the war of policy or practical relevance. Results should be examined to determine whether the questions they investigated are still timely. Since the studies of HIPPY and PAT reported in this journal issue were mounted, for instance, the service models have evolved, as reported in Appendix D on pages 192–94, and Appendix B on pages 179–89, in this journal issue, and it is not clear how many program sites are operating the older variations of the model rather than the newer ones.

Policymakers and practitioners should also consider the functional importance of a program’s results. A one- or two-point difference between huge groups on a paper-and-pencil test of parent attitudes or knowledge may be statistically significant, but it may have little or no importance for public policy or practice because the connection between results on that test and the outcome of ultimate interest (rates of child abuse and neglect) is too tenuous.

Similarly, a decrease of 1% or 2% in teen pregnancy rates will not be nearly as persuasive to a policymaker as a 10% shift. Indeed, policymakers may not judge a program a success unless it generates effects of a particular size, often because it is only when effects are large enough in magnitude that they produce cost savings.

In their review of CCDP, St.Pierre and Layzer note a few outcomes in which differences were statistically significant but too small, in their opinion, to be meaningful in terms of children’s development. None of the other researchers used this approach, and policymakers and practitioners will undoubtedly bring their own lens to these studies.

Generalizability

A rigorous evaluation, unmarred by high attrition and revealing important outcomes, can help demonstrate that a program has worked in a particular setting. Unfortunately, no single evaluation can demonstrate that the program will work equally well in another setting. The abilities of administrators and home visitors will differ across agencies, and the needs of families will vary across communities. Ancillary community services may be of high quality in one community but not in another. The program itself may be modified in different settings—by intention, to better serve the needs of a new population; by accident on the part of new practitioners not familiar with the model; or by necessity, to keep costs down. The article by Duggan and colleagues reviewing Hawaii’s Healthy Start Program details the ways in which different agencies, all implementing the same model, operate it differently, with dramatic differences in client participation and outcomes.

All of the articles in this journal issue report the results of studies of a model at more than one site. The variability of results across sites suggests that generalizability of results may be limited.