The debate over pay for performance in healthcare gets progressively more interesting, and confusing. And, with Medicare’s recent launch of its value-based purchasing and readmission penalty programs, the debate is no longer theoretical.
Just in the past several months, we’ve seen studies showing that pay for performance works, and others showing that it doesn’t. We’ve heard from some theorists who describe P4P as sapping intrinsic motivation and doing violence to professionalism, and others who feel that its effects are as natural and predictable as water running downhill. Some commentators beg us to stop it, while others denounce P4P’s current incarnations as too wimpy to work and recommend they be turbo-charged.
If we weren’t talking about the central policy question of a field as important as healthcare, we could call this a draw and move on. But the stakes are too high, so it’s worth taking a moment to review what we know.
In the U.S., the main test of P4P has been Medicare’s Hospital Quality Incentive Demonstration (HQID) program. A recent analysis of this program, which offered relatively small performance-based bonuses to a sample of 252 hospitals in the large Premier network, found that, after 6 years, hospitals in the intervention group had no better outcomes than those (3363 hospitals) in the control arm. Prior papers from the HQID demonstrated mild improvements in adherence to some process measures, but – as in a disconcerting number of studies – this did not translate into meaningful improvements in hard outcomes such as mortality.
Contrast the HQID results with those seen in a recent publication of the results of a 24-hospital study in northwest England, in which P4P (with higher bonuses, up to 4%) was associated with significant improvements in risk-adjusted mortality rates for patients with pneumonia, acute myocardial infarction, and heart failure. In other studies, substantial bonuses to British GPs (up to 30%) have been shown to be associated with improved adherence to process measures and intermediate outcomes such as control of hypertension and cholesterol.
The competing ideologies are as interesting as the dueling p values. It has become clear that the world is sorting itself into two camps: people rooting for P4P to flourish and others hoping that it crashes and burns. In the former camp are individuals who view doctors as economic creatures, nothing more, nothing less. They see protestations by physicians that “we can’t be bought” as both unbelievable and haughty. Such individuals find succor in the history of fee-for-service medicine. “We already have pay for performance,” I’ve been told on several occasions. “We pay more for the performance of procedures, hospitalizations, and office visits, and so that’s precisely what medicine produces.” Even for those rooting for better angels, this argument is hard to ignore.
In the opposite camp are those who point to medicine’s history as a noble profession. They note that one of the defining characteristics of professions is that they place the needs of those they serve over their own. In addition to this social-good argument, they tout empirical evidence from the trendy field of behavioral economics, which highlights the tension between intrinsic (driven by purpose, altruism, mastery) and extrinsic (driven by money) motivation. This research has demonstrated that not only do financial incentives frequently not work as well as one might like, they may even “crowd out” intrinsic motivation. The ever-present physician-gadflies Steffie Woolhandler and David Himmelstein, joined by behavioral economist Dan Ariely, highlight several of these arguments in a recent Health Affairs blog. They cite one study that found that incentive payments decrease the frequency of blood donations (as compared with voluntary donations) and another that found that parents became more likely to pick up their kids after-hours when an Israeli day care center imposed fines for late pickups.(“Fines had transformed promptness from a moral duty to a market transaction governed by price,” they write.)
This camp’s bible is Daniel Pink’s book Drive, and one of its prophets is my ABIM colleague Chris Cassel, who has argued, in two fine JAMA articles (coauthored by Sachin Jain, here and here) that financial incentives can suppress motivation, turning physicians from “knights” (individuals motivated by professional values) into “pawns” (passive participants doing backflips in response to external incentives). Another prominent member of this camp is Don Berwick, who addressed this issue before he became, well, Don Berwick. Writing in 1995, he argued
I find myself an extremist and therefore suspicious of my answer. But it is, nonetheless, the best answer I have yet found regarding merit pay for doctors or any group of workers; namely, “Stop it.” [Such pay] is destructive of what we need most in our healthcare industry – teamwork, continuous improvement, innovation, learning, pride, joy, mutual respect, and a focus of all of our energies on meeting the needs of those who come to us for help. We can find better ways to decide on how we pay each other and better uses for our energies than in the study and management of carrots and sticks.
So where does this gumbo of empirical evidence and exhortation leave us? As with most really complicated questions in life, the right answer will be as utterly unsatisfying to those rooting for their home team as it is predictable: it’ll lie somewhere between the poles. To me, while the evidence supporting P4P in healthcare is weak, it is far too early to pull the plug on a strategy with so much face validity, particularly with all that’s hanging in the balance.
But boy, do we have lots of details to sort out. How much money should be in P4P bonuses and penalties? (Medicare’s current value-based purchasing plan pegs bonuses at 1%, doubling in a few years – much lower than most experts recommend to catch the attention of doctors and hospitals.) What is the right mix of payments that go to best performers versus best improvers. (Early programs gave bonuses only to the former, but the correct answer must involve a Solomonic splitting of the baby, and Medicare’s current value-based purchasing plan does just that.) What is the best blend of process and structural vs. outcome measures? (Medicare began with process measures, but value-based purchasing and other P4P programs are increasingly combining processes with risk-adjusted outcomes.) When does transparency get you far enough along that P4P is superfluous or simply not worth the hassle? (The biggest surprise in this area has been the power of simple transparency in driving change, particularly since it has been accompanied by relatively meager consumer-based changes in behavior. This is part of the reason why it has been hard to show benefit for hospital P4P in the U.S.: all “control” hospitals are participating in a vigorous transparency program, which has been strikingly successful.) How can we align everyone’s incentives? (P4P programs to date have mostly focused on hospitals or doctors, rarely both. Future programs should try to align these forces.) And – perhaps most important of all – is there a way to implement P4P without dousing the flame of intrinsic motivation and professionalism? (I have no clever answer to that one, but I suspect someone smarter than me will figure it out.)
In the final days of the presidential campaign, we saw the hazards of reading the evidence – in this case, the polls – through an ideological lens. Even as we debated the “Is-Nate-Silver-right?” question, we knew that the final answer was forthcoming, on the evening of November 6. With P4P, it won’t be quite that simple: there will not be any singular event that tells us that we have things structured just right to maximize benefit and minimize harm. So, for now, the best stance is to keep an open mind, listen to both sides of the argument, review the research in as unbiased a way as one can muster, and pray for more and better studies.
In the end, I’m guessing that the best solution will not be one that treats physicians as purely economic animals. They’re not. But – as much as we’d wish it to be so – it is equally unlikely to be one that relies completely on the kindness of strangers.