Quality Measurement mavens are reeling these days, as a result of the air being let out of high-profile measures such as tight glucose control, door-to-antibiotic time, and beta-blockers. Some critics have even suggested that we put a moratorium on new quality measures until the science improves.
I hope we don’t.
I think we’re seeing a natural, and fairly predictable, ebb and flow, and our reaction – even to these significant setbacks – should be thoughtful and measured. Here’s why:
The publication of the IOM’s Quality Chasm report (and McGlynn’s findings that we adhere to evidence-based practice about half the time) generated intense pressure to launch transparency and pay-for-performance initiatives. Finding virtually no outcome measures ready for prime time (the data collection burden was too large and the science of case-mix adjustment too immature), policymakers and payers logically looked to process measures (aspirin, ACE inhibitors, pneumovax) for common diseases (MI, CHF, pneumonia), delivered in settings (hospitals) that could be held accountable. And they sought levels of evidence that were, if not perfect, then at least good enough.
The National Quality Forum was created to vet this evidence. But the NQF has a problem not unlike that of the FDA: too low an evidence bar and bad measures become “law”; too high a bar and the ravenous hunger for quality measures goes unsated. Unsurprisingly, the demand for measures won out and the bar was set relatively low – not so much in terms of study design, but rather in terms of the degree to which initial promising studies had their findings confirmed by subsequent research.
With that as prelude, we shouldn’t be shocked by what we’re seeing now: a mini-pattern in which one or two tightly managed, single-site studies that showed great benefit are followed by studies done in more diverse and real world settings whose results are disappointing. It has always been thusly. The difference is that now, by the time the later studies are published, the quality measures have long since been disseminated.
I won’t belabor the point since I’ve covered this ground previously in my discussions of the individual measures (such as glucose, beta-blockers, and door-to-antibiotics). But the fascinating trend to watch now is the beginnings of a Quality Measurement Backlash – it’s not a full-fledged, “spontaneous” Sean Hannity tea party just yet, but the night is young. Consider, for example, the Jerry Groopman/Pamela Hartzband article in last week’s Wall Street Journal:
In too many cases, the quality measures have been hastily adopted, only to be proven wrong and even potentially dangerous to patients….Yet too often quality metrics coerce doctors into rigid and ill-advised procedures. Orwell could have written about how the word “quality” became zealously defined by regulators, and then redefined with each change in consensus guidelines….
The solution, say the authors, is to stop the presses:
Before a surgeon begins an operation, he must stop and call a “time-out” to verify that he has all the correct information and instruments to safely proceed. We need a national time-out in the rush to mandate what policymakers term quality care to prevent doing more harm than good.
If that wasn’t enough fun for the quality measurers, the article by Rachel Werner and Bob McNutt in last week’s JAMA surely was. After critiquing today’s measures for the usual reasons, the authors suggest a “new approach”:
First, the focus of quality improvement initiatives should be on improving rather than measuring quality of care… Second, quality improvement initiatives should be tied to local actions and local results rather than national norms. This acknowledges that quality improvement efforts are not generalizable and one solution does not fit all.
…Quality improvement incentives can be restructured based on these principles. Current incentives are based on measured performance and are benchmarked to national norms. An alternative approach is to tie incentives to the local process of improving quality of care rather than the results of quality measures. This could take the form of requiring local teams of quality improvement personnel to identify problems through investigation, identify solutions to these problems, implement solutions, and document local improvement… A logical next step is to tie current quality improvement incentives to this approach—pay based on participation in quality improvement efforts rather than simply comparing each other on measures that do not reflect the learning that is required to really improve care.
The small, elite group of nationally recognized measure vettors is feeling increasingly besieged. You may have seen the comment on this blog by one of them, Dale Bratzler of the Oklahoma Foundation for Medical Quality (creator of many national measures, including those used in the Surgical Care Improvement Program), written in response to my post on tight glucose control. Bratzler wrote:
…I am tiring of some of the criticisms related to quality initiatives because the authors of those criticisms often fall victim to the same practices that they criticize. It seems to be increasingly common for opinion pieces, editorials, anecdotal reports, underpowered studies, and single-institution studies to be used to suggest that quality initiatives are resulting in widespread patient harm. Frankly, I have not seen systematic evidence of that for most national quality initiatives and in some cases, have data to suggest that for many of the conditions targeted in those initiatives, patient outcomes are slowly but progressively improving.
Bratzler goes on to state, correctly, that the glucose standard in SCIP was not the brutally tight 80-110 mg/dL, but rather a more generous (and less dangerous) <200 mg/dL. He then acknowledges that
…some hospitals undoubtedly go beyond the requirements of the SCIP measure and that could result in harm… But on a national basis, surgical outcomes actually are improving over time and there is no national requirement to implement programs of intensive blood sugar control.
That last point is technically accurate, but other national campaigns, or influential national organizations, have promoted tighter control than that recommended in SCIP. For example, the Surviving Sepsis Campaign targets a glucose level of <150 mg/dL, and the Institute for Healthcare Improvement’s target is the Van Den Berghe standard of 80-110 mg/dL (although, to be fair, IHI stuck to the SCIP standards in their recently completed 5 Million Lives campaign).
But before we get too distracted by all these angels dancing atop an insulin pen, let’s take a step back and consider the big picture. We’ve seen that some widely disseminated and promoted performance measures haven’t worked out as intended – usually because evidence emerged that was less impressive than the initial salvo.
And now we have Groopman and Hartzband arguing that we should take a “time out” on quality measures, leaving it to doctors to make their own choices since only they truly know their patients. Do we really believe that the world will be a better place if we went back to every doctor deciding by him or herself what treatment to offer, when we have irrefutable data demonstrating huge gaps between evidence-based and actual practice? Even when we KNOW the right thing to do (as in handwashing), we fail to do it nearly half the time! Do the authors really believe that the strategy should remain “Doctor Knows Best”; just stay out of our collective hair? Pullease…
And if we agree that we need some measurement to catalyze improvement efforts, do we really want measures that can be met through elaborate dog-and-pony shows, with no demonstration of improved processes or outcomes? Really? Sure, the Joint Commission should check to be sure that hospitals use strong QI methods (a major interest of new Joint Commission prez Mark Chassin, BTW), but there has to be more, much more. A close colleague, one of the world’s foremost quality experts, wrote me about the Werner/McNutt article, finding it unhelpful
…because you still have the measurement problem – how are you going to know whether or not any of these actions are actually happening, and whether or not they are actually improving anything???
The sentence in the JAMA piece, “This could take the form of requiring local teams of quality improvement personnel to identify problems through investigation, identify solutions to these problems, implement solutions, and document local improvement” reminded this colleague of the TQM fad twenty years ago, when some accreditors and insurers began requiring documentation of “storyboards” of hospitals’ PDSA cycles. He recalled visiting some hospitals preparing for inspections and
…I would see the storyboards on the wards and they were just laughable in terms of what they presented and their supposed relation to cause-and-effect. It was all a charade, done to get through the inspection and nothing – or very close to nothing – meaningful was really being accomplished.
So, to the Dale Bratzler’s of the world, I say, Courage! Keep it up. And don’t let the bums (including this one) get you (too) down. The bottom line is that we need quality measures, and we need rigorous research to create good ones. When we do create a flawed measure (an inevitability), let’s admit it and fix it. If hospitals have been pushed toward tight glucose control based on now-partly discredited evidence, let’s say so, improve the measure, and resolve to learn something from the experience – not just when we’re thinking about glucose measurement but also when we’re considering the strength of the evidence supporting other difficult-to-implement and potentially dangerous practices. Ditto door-to-antibiotic timing.
As for me, I’ll keep critiquing bad measures and pointing out when new science emerges that changes how we should think about existing measures. But I’ll continue to support thoughtful implementation of transparency programs and experiments using P4P. From where I sit, of all our options to meet the mandates to improve quality and safety, tenaciously clinging to a Marcus Welbian (and demonstrably low quality) status quo or creating tests that can be passed by appearing to be working on improvement seem like two of the worst.