CARDIOVASCULAR JOURNAL OF AFRICA • Vol 23, No 10, November 2012
548
AFRICA
a normal distribution with mean of 0 and variance 1 000. The
hyper-precision,
σ
0
-2
(
inverse of the hyper-variance,
σ
0
2
)
was
given a gamma distribution with shape and scale parameters both
equal to 0.001; implying that the hyper-precision had a mean of
1
and variance 1 000. This prior translates into a locally uniform
distribution on the logarithm of the hyper-variance.
We used two ways of identifying outliers; one based on the
hospital’s RAMR, and the other based on the rank of RAMR
among all the hospitals’ RAMR. Assessments of agreement
were initially based on point estimates between each hospital’s
ranks, and between risk-adjusted mortality rates. These pair-
wise agreements could be assessed using Bland–Altman plots.
21
However, we used simple two-way scatter plots, where agreement
was judged against the line of equality.
We concentrated on categorising the different classification
outcome measures into low, normal or high mortality risk, and
then assessing agreement across the categories. In categorising a
hospital’s RAMR, we examined the probability of it exceeding a
specified threshold. The overall 30-day mortality rate was 10.2%
for our patient cohort. A hospital
i
is classified as a high outlier if
Prob [RAMR
i
>
(1
+ σ
) 10.2]
≥
0.75
and, similarly, it is classified
as a low outlier if Prob [RAMR
i
<
(1 –
σ
) 10.2]
≥
0.75,
otherwise
the hospital is classified as normal.
The threshold value
δ
can take any value, but values of 10, 15
and 20% are commonly used.
18
We conservatively chose
δ
to be
20%,
which has the effect of minimising the number of outlying
hospitals, therefore hospital
i
is a high outlier if Prob (RAMR
i
>
12.24)
≥
0.75,
and a low outlier if Prob (RAMR
i
<
8.16)
≥
0.75.
For ranks, we calculated Bayesian point estimates and 95%
credible intervals of each hospital’s rank. Hospitals whose 95%
intervals fell entirely in the bottom or upper quartile of ranks (i.e.
upper limit is
≤
32.75
or lower limit is
≥
96.25)
were classified as
low or high outliers, respectively; otherwise they were normally
performing hospitals.
With two modelling approaches (the fixed- and random-effects
models) plus two ways of classifying hospital performance, we
had four different methods for profiling hospitals. In all, there
were six possible pair-wise comparisons.
For each comparison, we used the kappa (
κ
)
statistic to assess
the amount of agreement between the methods. The statistic
measures the proportion of observed-to-expected agreement,
and we adopted the convention that
κ
>
0.75
indicates excellent
agreement,
κ
=
0.4–0.75
indicates good agreement, and
κ
<
0.4
indicates marginal agreement,
22
even though
κ
has been
criticised for its limitations. In order to allow for different levels
of agreement, we used a weighted
κ
statistic.
Implementation
The computation of the models was done using Markov Chain
Monte Carlo methods (MCMC); specifically we used Gibbs
sampling as implemented in WinBUGS.
23
For each method
considered, three parallel Gibbs sampler chains from independent
starting positions were run for 50 000 iterations. We monitored
10
randomly chosen random effects, and for hierarchical models
also hyper-parameters for convergence.
Trace plots of sample values of each of these parameters
showed that they were converging to the same distribution.
We formally assessed convergence of the three chains using
Gelman–Rubin reduction factors,
24
and all were estimated near
1.0
by 15 000 iterations. We therefore took 15 000 iterations to
be in the burn-in period. For posterior inference, we used the
remaining 35 000 iterations to give a combined sample size of
105 000.
Results
Existing ACS risk scores include a multitude of factors. Patient
age, systolic blood pressure (SBP), heart rate (HR) at admission
and ECG findings are systematically included in most of the
risk-scoring systems.
25-27
In a large sample of European patients
with ACS, age was found to impact on most of the clinical
presentations and on hospital mortality.
28
Therefore inclusion of
age in a risk model would account for many of the baseline, prior
and clinical risk factors.
The risk variables that we used in the case mix logistic
regression model for the risk adjustment are presented in
Table 1, where age cut-off points were based on Resengren
et
al
.,
28
and SBP and HR on their fifths. The fitted model had an
estimated
c
-
statistic (area under the ROC curve) of 0.798, with
a 95% confidence interval of 0.794 to 0.803. The inclusion of
co-morbidities (e.g. diabetes and chronic renal failure) resulted in
loss of data and minor improvement on the
c
-
statistics. Using only
age, SBP and HR, whether continuous or categorised, resulted in
a similar value of the
c
-
statistic of 0.777 (0.772–0.781).
Using this predictive model of 30-day mortality shown in
Table 1, we evaluated the expected number of deaths,
E
i
in
hospital
i
to obtain its standardised mortality ratio, SMR
i
=
O
i
E
i
and risk-adjusted mortality rate, RAMR
i
=
10.2
×
SMR
i
,
which
ranged from 4.54 to 19.44% with a median of 9.91%.
Table 2 shows the top and bottom five ranked hospitals
according to their risk-adjusted 30-day mortality rate. The top
or bottom ranked 10 hospitals were more or less the same using
only age, SBP and HR but with a slightly longer range, 4.14 to
23.32%.
Comparisons of agreement between a hospital’s risk-adjusted
mortality rates and between ranks of the risk-adjusted mortality
rates from fitting the fixed- and random-effects models are
shown in Fig. 1A, B. For each plot, lines of equality are shown,
and comparisons are based on posterior medians. The observed
agreement appears to be very poor between the risk-adjusted
mortality rates. On the other hand, for the ranks, the points lie
evenly around the line of unity, showing very good agreement.
In both plots, agreement is very poor between outcome
measures for either low or high outlying hospitals. Furthermore,
the plots show that estimated outcome measures are more
variable under the fixed-effects model.
The problems observed from using point estimates for
assessing agreement can be partially nullified by categorising
the hospitals into low, normal and high performing. Comparisons
based on categories of risk between different methods are shown
in Table 3. All methods were able to classify hospitals as low-
and high-outcome outliers; however, only seven and 11 from
128
were classified as such under the hierarchical rank and
RAMR methods, respectively, while 31 and 33 were outliers
under the fixed-effects rank and RAMR methods, respectively.
As expected, profiling methods using hierarchical models were
more conservative in classifying hospitals as performance
outliers than were the non-hierarchical models.
The observed agreement in the methods’ classification of
hospitals ranged from 90 to 98% of the time, the highest being
between the two hierarchical methods. In only one of the six