CARDIOVASCULAR JOURNAL OF AFRICA • Vol 23, No 10, November 2012
550
AFRICA
Discussion
This study compared the performance of four methods for
profiling hospitals and assessed their agreement. The methods
included combinations of two Bayesian methods, fixed and
hierarchical, and two ways of identifying outliers, rank and
exceeding some threshold using a hospital’s risk-adjusted
mortality rate; two were based on a hospital’s rank for its risk-
adjusted mortality rate, obtained from fitting both fixed- and
random-effects models. The agreements between the different
methods were empirically examined using an extensive dataset
of ACS patients.
Even though all the methods were able to classify hospitals
as low- and high-outcome outliers, profiling methods using
random-effects models were more conservative than fixed-
effects models in classifying hospitals as having better- or
worse-than-expected mortality. These findings were expected on
theoretical grounds and support the results from a multitude of
prior studies, showing that random-effects models identify fewer
performance outliers.
8,11
In the present study, the observed agreement in the methods’
classification of hospitals ranged from 90 to 98%, the highest
being between the methods within each effects model. The
agreement was excellent (
κ
=
0.77)
in only one of the six
comparisons. Otherwise, in all the remaining five scenarios, the
agreement was, at best moderate (
κ <
0.75).
Our findings relied on routinely collected clinical data. These
types of data suffer from incompleteness and inaccuracy of the
variables entered.
31
In our preliminary investigation, 11% of
the total patients had missing codes on survival status. We did
not have full data for admission age, SBP, HR, ECG findings
and biochemical markers of the patients. Other risk variables
that may have been used also demonstrated missing data, thus
limiting the number of risk factors in the case mix adjustment
model on this occasion. However, our findings were shown to
be robust to which factors were included in the risk-adjustment
model. Indeed, difficult-to-obtain key clinical variables add little
to the predictive power of ACS risk scores.
27
It may well be that the hospital performance variation
exhibited in this study was substantially contributed to by the
variation in definitions and data quality, as alluded to by Lilford
et al.
4
However, it is unlikely that these issues alone could
be attributed to the outcome variation found across the four
analytical strategies examined.
We did not impute for missing data since other researchers
have shown that this does not affect the prediction model or
mortality.
32
A more elaborate assessment of MINAP data quality
and validity on the resulting classification of hospitals is the
subject of a British Heart Foundation-funded project within our
group undertaken by Gale
et al
.
33
For the present study, it suffices
to say that the number of patients analysed and the data used were
of sufficient quality to enable a comparison of different methods
to assess the hospitals’ performance for 30-day mortality among
ACS patients. However, we remain cautious regarding the exact
inference made for some hospitals, given their data quality.
We performed a limited-sensitivity analysis to different prior
specifications of the hospital random-effects variation and
threshold values. We found classification of outlying hospitals
was not affected by changes in the random-effects variations, but
it was slightly affected when the thresholds were changed.
A more elaborate sensitivity analysis would alter specification
of the hospital random-effects distribution as the assumed normal
distribution is not robust and flexible enough to account for
outlying hospital effects. Therefore it may be necessary in future
research to model the hospital effects more flexibly, for example
by heavy-tailed
t
distributions to investigate both sensitivity
and robustness of the results, as in Manda,
34
or mixtures or
non-parametric Dirichlet distributions, as in Ohlssen.
35
The threshold level chosen and the required probability
of exceeding this threshold to classify a hospital using the
risk-adjusted mortality rate as an outlier were subjective and
completely arbitrary. We could have used other thresholds
and probabilities, as in Austin,
12
which may have generated
stronger or weaker levels of agreement between the methods.
Furthermore, the requirement that intervals of the ranks must lie
entirely in the bottom or top quarters of ranks for the hospital to
be classified as an outlier was also arbitrary but has been used
before.
11,12
Results from any study on profiling hospitals’ performance
are predictably used to produce league tables of performance.
We are aware of the many criticisms surrounding the statistics
used in measuring performance and the subsequent ranking of
hospitals. We did not intend to contribute to this controversy.
Our aim was to describe and compare the performance of
four different Bayesian methods for institutional profiling. In
using ranks to compare hospitals, caution should be exercised
since most hospitals had considerably overlapping intervals,
which made it difficult to obtain reliable ranking, especially for
hospitals admitting fewer patients.
We follow Normand
et al
.,
10
Marshall and Spiegelhalter,
11
Austin
12
and Ohlssen
et al.
18
in advocating the use of Bayesian
methods, which when pooling data across hospitals, handles
the problem of small hospitals better than frequentist methods,
for which a minimum number of patients is required before a
hospital can be included.
12
However, if we are willing to accept
wide confidence intervals, the exact probabilistic methods can
be used within a frequentist framework to handle small hospitals
(
see Luft and Brown
36
).
Furthermore, it is much easier within
Bayesian methods to determine uncertainty associated with
the ranks, which are very sensitive to sampling variations (see
TABLE 3. CLASSIFICATION OF HOSPITALS UNDER
THE FIXEDAND HIERARCHAL MODELS
Fixed RAMR
Fixed rank
Hierarchical RAMR
Low Normal High Low Normal High Low Normal High
Fixed RAMR
Low –
–
– 20 0
0 6 14 0
Normal
–
–
– 7 88 0 0 95 0
High
–
–
– 0
9
4 0
8
5
= 0.71
= 0.46
Fixed rank
Low –
–
– –
–
– 6 21 0
Normal
–
–
– –
–
– 0 96 1
High
–
–
– –
–
– 0
0
4
= 0.44
Hierarchical rank
Low 2
0
0 2
0
0 2
0
0
Normal 18 95 8 25 96 0 4 117 0
High
0
0
5 0
1
4 0
0
5
= 0.32
= 0.29
= 0.77