Journal Issue: Low Birth Weight Volume 5 Number 1 Spring 1995
Analysis of Variations
Since 1973, when Wennberg and Gittlesohn demonstrated that there were marked variations in the utilization of surgical procedures among hospital service areas within the state of Vermont, a large body of research has confirmed that both patterns of care and patient outcomes vary among geographic areas and hospitals in ways that cannot be explained by differences in the patient populations which are served.44 In addition to large differences in utilization rates for surgical procedures, diagnostic services, and hospital admissions, there is also wide variation in hospital mortality for a number of different medical conditions.45
In the following sections, we will review the data concerning variations in interventions and outcomes for neonatal intensive care and discuss approaches to risk adjustment for neonatal patients which may be useful in determining the causes of the observed variations.
Variations in the Use of Interventions
Large variations in the use of prenatal corticosteroids exist despite their proven effectiveness in reducing morbidity and mortality among preterm infants. Corticosteroid treatment of women at risk for preterm delivery induces lung maturation in the fetus and improves neonatal outcomes.46 (For further discussion of the effectiveness of corticosteriods, see the article by Ricciotti in this journal issue.)
There is clear and convincing evidence from numerous randomized controlled trials that antenatal corticosteroid treatment not only reduces the risk of respiratory distress syndrome in preterm infants of treated women but also reduces the risk of death and intraventricular hemorrhage.47 Despite this evidence, many obstetricians prescribe antenatal steroids infrequently for women at risk for preterm delivery, and some obstetricians do not prescribe them at all. At 73 centers participating in the Vermont-Oxford Trials Network in either 1991 or 1992, 26% of the 8,749 infants weighing 501 to 1,500 grams (from 1 pound, 2 ounces to 3 pounds, 5 ounces), were born to women who had received antenatal steroids.48 Twenty-five percent of the centers in the network had treatment frequencies of 11% or less; 25% of centers had frequencies of 36% or more; only 10% of centers had frequencies of 60% or more. Data from the NICHD Neonatal Research Network also indicate wide variation in the use of antenatal steroid therapy for very low birth weight infants.40 Overall, 16% of infants in the NICHD Network were delivered to women who had received steroids with a range of 1% to 33%.
Reports from these two neonatal networks also document substantial variation among neonatal intensive care units (NICUs) in the use of a number of other postnatal interventions and procedures. Table 1 shows the overall frequencies for selected interventions and their interquartile ranges at 68 centers which participated in the Vermont-Oxford Trials Network in 1992.49
The variation persisted even within 250-gram birth weight categories. Data for variation in postnatal interventions are also provided by the NICHD Neonatal Research Network. Methods of delivery room management, use of phototherapy, exchange transfusions, indwelling vascular catheters, and parenteral nutrition all exhibited considerable variation among the NICHD Network Centers.40 The persistence of variation within relatively narrow birth weight categories suggests that the variation is due in large part to differences in practice styles among the units.
Variations in Outcomes After Neonatal Intensive Care
As previously discussed, neonatal intensive care has resulted in increased birth-weight-specific survival rates and a decrease in the overall infant mortality rate. Infants born at hospitals with level 3 neonatal intensive care units have lower neonatal mortality than infants born at hospitals without such units.15-19 Even among level 3 neonatal intensive care units, however, there are substantial variations in both mortality and morbidity among the survivors.
Avery and colleagues found that the incidence of chronic lung disease in infants weighing 700 to 1,500 grams (from 1 pound, 9 ounces to 3 pounds, 5 ounces) varied significantly among the eight institutions studied even after adjusting for birth weight, race, and gender.50 The investigators suggested that the observed variation was due to differences in respiratory care practices among the centers. Horbar, in a study of 11 neonatal intensive care units, found differences among centers both in the frequency of chronic lung disease and in neonatal mortality.51 Again, the differences persisted after adjustment for birth weight, race, and gender. Kraybill and colleagues, in a survey of 10 neonatal units in North Carolina, found significant differences among centers in the frequency of chronic lung disease.52 They also suggested that differences in respiratory care practices might explain the findings. Hack and colleagues, reporting for the NICHD Neonatal Research Network, indicate that there are large intercenter differences in morbidity, particularly with respect to chronic lung disease, necrotizing enterocolitis, intraventricular hemorrhage, and jaundice.40 Wide variation in most morbidities have also been documented for centers in the Vermont-Oxford Trials Network (see Table 2).49
These data suggest that there are differences among neonatal intensive care units with respect to short-term morbidity and mortality. While some of these differences may be due to differences in the way specific outcomes are diagnosed at the different centers, the extent to which they are due to differences in the quality of medical care is unknown. Data regarding variation in long-term neurodevelopmental outcomes and other morbidities among centers are not available.
Variations in the outcomes of hospitalized patients have been used as indicators of the effectiveness of medical care. However, before inferences can be drawn from observed differences in mortality or other outcomes among hospitals, it is necessary to account for differences in case mix. Variation in hospital mortality has three major sources: the underlying risk of a hospital's patient population, the effectiveness and appropriateness of care provided at the hospital, and sampling variations (the likelihood that the mortality observed in the study group truly represents the experience in the total population).53 Statistical models for predicting mortality risk based on patient characteristics have been developed for use in a number of different clinical situations, including adult medical and intensive care, pediatric intensive care, and neonatal intensive care.54-58 These risk adjustment models can be used to compare the observed outcomes at a particular hospital with the outcomes that would be expected based on the demographic characteristics of the hospital's patients as well as the severity of their illnesses measured by physiologic and laboratory values. After differences in patient risk and sampling variations have been accounted for, residual variation in outcome is assumed to reflect differences in the effectiveness and/or appropriateness of medical care.
One of the earliest examples of risk adjustment for the evaluation of perinatal care was reported by Williams.37 He applied a model for predicting neonatal death based on birth weight, race, sex, and multiple birth to more than 3 million infants born at 504 hospitals in California during the years 1960 to 1973. After the model had been used to account for newborn risk and the effect of chance, there was still a twofold variation in mortality at these hospitals. This residual unexplained variation was presumably the result of differences in the effectiveness of perinatal care.
More recently, risk adjustment models have been developed specifically for neonatal intensive care. Richardson and colleagues have developed the Score for Neonatal Acute Physiology (SNAP), which is patterned after the Acute Physiology and Chronic Health Evaluation (APACHE) score used in adult intensive care and the PSI used in pediatric intensive care.59 The SNAP can be applied to all NICU admissions regardless of birth weight. The SNAP is predictive of neonatal mortality even within narrow birth weight strata and is correlated with other indicators of severity, including nursing workload, therapeutic intensity, and physician estimates of mortality risk. Furthermore, the SNAP increases the accuracy of neonatal mortality risk prediction when used along with birth weight, five-minute Apgar score, and size for gestational age.60 In the future, use of scoring systems such as the SNAP will help to refine the risk adjustment analyses and provide us with a clearer picture of variations in neonatal mortality across hospitals.
The International Neonatal Network has developed the Clinical Risk Index for Babies (CRIB), a scoring system for predicting mortality risk for infants weighing 1,500 grams (3 pounds, 5 ounces) or less.61 The CRIB score is based on birth weight, gestational age, maximum and minimum fraction of inspired oxygen, maximum base excess, and presence of congenital malformations. The score uses values obtained within 12 hours of admission. The CRIB score is more accurate than birth weight alone in predicting mortality risk, and higher scores are associated with an increased risk for major cerebral abnormality.
Because postadmission data may reflect the results of treatments provided in the neonatal intensive care unit rather than the infants' underlying risk, mortality prediction models based only on admission data are preferred if the goal of risk adjustment is to identify differences in the effectiveness of care. Both the SNAP and the CRIB score use information collected during the first 12 to 24 hours after admission to the neonatal intensive care unit for predicting mortality risk.
Figure 1 shows the standardized neonatal mortality ratios (SNMRs) at 68 centers participating in the Vermont-Oxford Trials Network in 1992, and illustrates the existing variation in mortality rates in these centers. In this model, which is based on factors present at the time of admission, the observed variations cannot be attributed to the infant's birth weight, race, gender, health at birth, receipt of prenatal care, and location of the birth because the effects of these factors have been statistically controlled. The SNMR is the ratio of the number of observed deaths at a center to the number of deaths predicted based on the patient characteristics in the model. An SNMR of 1 means that a hospital has exactly the number of deaths which would be expected; values greater than 1 indicate that more deaths occurred than were expected; values less than 1 indicate that fewer deaths occurred than were expected. Although some hospitals have SNMRs that are less than 1 and others have SNMRs that are greater than 1, in most instances the 95% confidence limit includes the values of 1, which means that these hospitals do not appear to have too many or too few deaths. Improved predictor models which include major birth defects among the predictor variables are currently being developed.
Neonatal mortality prediction could serve several purposes. One purpose is the prediction of individual patient risk. It is unlikely that any model will be accurate enough to aid in patient care decisions such as when to withhold or withdraw life support. However, prediction of individual risk may be useful for identifying infants who died despite having a low predicted probability of death. These cases could then be chosen for audit as part of local quality improvement efforts.
A second purpose for neonatal risk prediction is the identification of outlier hospitals where the quality or effectiveness of care is low. Given the relatively small number of very low birth weight infants treated at individual neonatal intensive care units, the confidence intervals for estimates of measures like the SNMR will be large.62 This will severely limit the power of even very accurate statistical models to identify outlier hospitals. Aggregating cases over multiple years increases the ability to detect outliers. It remains to be proven, however, that targeting hospitals in this way accurately identifies units providing less effective care,63 as methods to adjust for the underlying risks and differences in the units remain imperfect.
A third purpose for neonatal risk prediction models is their use in studies of hospital characteristics associated with outcome. The power to detect differences in risk-adjusted mortality rates among groups of hospitals within large neonatal networks will be greater than the power to detect individual outliers. Several studies have already shown that hospital characteristics are associated with outcomes for adult and neonatal patients.37,64 Williams, in the study discussed above, found that, after adjusting for patient risk, hospitals with larger delivery services, urban hospitals, hospitals performing above-average numbers of cesarean sections, those recording Apgar scores, and hospitals with higher specialist-to-generalist ratios had lower mortality rates.37 Conversely hospitals with more Spanish-surnamed mothers and private proprietary hospitals had higher mortality rates. Paneth and colleagues have shown that risk-adjusted neonatal mortality rates at level 3 hospitals are lower than at either level 1 or level 2 hospitals in New York City.15,16 The International Neonatal Network has also shown that mortality rates adjusted for risk using the CRIB score are lower in tertiary as opposed to nontertiary neonatal care units in the United Kingdom.57
We are currently using data from the Vermont-Oxford Trials Network to investigate whether hospital characteristics, services, and staffing patterns are associated with differences in mortality for very low birth weight infants. It is not known whether patient volume, teaching status, hospital ownership, and use of ancillary personnel such as neonatal nurse practitioners influence the costs and outcomes of neonatal intensive care. Because of trends toward deregionalization of care and changing patterns of referrals due to managed care, it will be increasingly important to understand how these factors affect both costs and outcome. Neonatal networks will be valuable laboratories for answering health services questions about the delivery of neonatal intensive care.