Why AI Is Good for Our Health but May Hurt Our Wallets

Sneha S. Jain; Morgan Cheatham; Michael A. Pfeffer; Linda Hoff; Nigam H. Shah

Sneha S. Jain*, Stanford University School of Medicine, Morgan Cheatham*, Bessemer Venture Partners, Michael A. Pfeffer, Stanford University School of Medicine, Linda Hoff, Stanford Health, Nigam H. Shah, Stanford University School of Medicine

*These authors contributed equally to the work as co-first authors

Contact: snehashahjain@stanford.edu

Abstract

What is the message? Current regulatory frameworks, reimbursement structures, and business models for AI in healthcare are decoupled, which has created an environment in which AI may significantly increase costs without necessarily improving outcomes. This misalignment stems from inadequate regulatory and business incentives for real-world performance evaluation of AI as well as reimbursement gaps that lead to pricing strategies prioritizing financial gains over improved quality and value to recoup development costs. The authors recommend three key reforms: rigorous pre- and post-deployment evaluation to verify proposed clinical value, development of assessment standards through shared guidelines, and strategic alignment of AI deployment modalities with sustainable business models to ensure these tools enhance care quality while responsibly managing healthcare costs.

What is the evidence? This paper cites over 50 sources, including academic literature, Food and Drug Administration documents and policy, written statements from the Centers for Medicare and Medicaid Services, and industry reports.

Timeline: Submitted: October 19, 2024; accepted after review November 1, 2024.

Cite as: Sneha Jain, Morgan Cheatham, Michael A. Pfeffer, Linda Hoff, Nigam H. Shah. 2024. Why AI Is Good for Our Health but May Hurt Our Wallets. Health Management, Policy and Innovation (www.HMPI.org). Volume 9, Issue 3.

Download PDF

Listen to an AI-generated podcast with co-author Nigam Shah here

Introduction and Overvie

In 1995, Charlie Munger said “Show me the incentives, and I’ll show you the outcome.” Thirty years later, this holds true for artificial intelligence (AI) in healthcare. Current approaches to evaluating, regulating, and paying for AI in healthcare incentivizes use of AI in a manner that is likely to increase total cost of care. The problem stems from disconnected regulation and reimbursement approval decisions, anemic health information technology (IT) budgets, and complex revenue structures across stakeholders necessitating creative business models to subsidize adoption of AI tools. Often these business models are crafted independently of, and without, the necessary workflow redesign to capture the full potential of AI¹.

Three federal agencies, the Food and Drug Administration (FDA), the Office of the National Coordinator for Health IT (ONC), and the Centers for Medicare & Medicaid Services (CMS), regulate AI tools in healthcare. These tools are regulated as software as a medical device (SaMD)², practice-management software under Health Data, Technology, and Interoperability Certification Program (HTI-1)³, and the Clinical Laboratory Improvement Amendments (CLIA). There is also software as a treatment modality, or digital therapeutics (DTx)⁴, which are subject to safety and efficacy evaluations. Additionally, other software and tech-enabled services, such as revenue cycle management, are procured directly by healthcare entities outside of regulatory purview and are only subject to enforcement by the Federal Trade Commission (FTC) or the Office of Civil Rights (OCR).

This patchwork of regulatory frameworks leads to the inconsistent evaluation of the clinical impact of AI tools. For example, Wu et al., found that most FDA-cleared medical AI devices were evaluated pre-clearance through retrospective studies, with many lacking reported number of evaluation sites or sample sizes⁵. El Fassi et al. found that almost half of authorized tools were not clinically validated and were not even trained on real patient data, concluding that FDA authorization is not a marker of clinical effectiveness⁶. In general, it is widely accepted that robust AI testing and validation infrastructure in medicine is lacking⁷ and our regulatory regimens need to be updated.

Regulatory approvals examine if AI tools “work”, but not whether they create “value” in the form of better quality of care for patients relative to the cost, which is often considered in procurement and reimbursement decisions. They also do not take into consideration how an AI tool will fit into existing or new workflows. This decoupling between regulatory approval and reimbursement requires users of AI tools – especially those tools that are used to render medical care – to figure out how to pay for the cost incurred by using a tool based on the value obtained.

Lobig et al. recommend that reimbursement for an AI tool, if separate from the cost of the underlying imaging study, should be decided based on evidence of improved societal outcomes ⁸. However, for regulated tools, the assignment of value to a reimbursed AI tool is artisanal at best ⁹. Payment rates differ significantly between private vs public payers. For example, Wu et al. found that reimbursement for CPT code 92229 for diabetic retinopathy is approximately 2.8 times higher for private patients than for CMS patients ⁹. There is little consistency in how reimbursement for AI tools used for medical care is valued compared to the non-AI alternatives. For example, reimbursement for AI-based interpretation of breast ultrasound is comparable to a traditional breast ultrasound, whereas the reimbursement for AI-powered cardiac CT for atherosclerosis is two to three times the out-of-pocket cost of such a study^9,10. While mechanisms exist to facilitate reimbursement during the nascent stages of technology adoption, such as New Technology Add-On Payment (NTAP) ¹¹ and Transitional Coverage for Emerging Technologies (TCET), these solutions are unlikely to fully accommodate the rapid growth of AI-based tools in healthcare ¹².

For non-regulated technology, adoption depends on market forces to identify high-value solutions, and incumbent vendor platforms in facilitating their use, which may vary by care setting and reimbursement (i.e., urban vs. rural, fee-for service vs. value-based payment).

Therefore, as Davenport and Glaser note, despite abundant research and startups, very few AI tools have been adopted by healthcare organizations ¹³. They attribute this to factors such as regulatory approval, reimbursement, return on investment, integration challenges, workforce education, the need for changing workflows, and ethical considerations, and conclude that new organizational roles and structures will be necessary to successfully adopt these technologies. Many of these challenges stem from needing to pay the hidden deployment costs of AI tools ¹⁴.

To address these challenges, Adler-Milstein et al. emphasize the need to couple the creation of equitable tools, their integration into care workflows, and training of health care providers with strong regulatory oversight and financial incentives for adoption in a way that benefits patients ¹⁵.

Whether an AI tool is regulated or not, the developers (and users) currently have to conform to existing payment methods for technology or medical care. Thus, paying for technology ends up being a net-new cost to health IT budgets. For example, a per-user license for ambient scribe technology is not a form of direct ‘medical care’, and hence brings no new revenue to a provider. Existing ways to pay for the tools as ‘medical care’ – while having the potential to bring new revenue – is fraught with value judgments and is still a net-new cost to the payers. As a result, adoption of AI tools remains low compared to the hype around them. For example, Wu et al. find that even though the number of devices cleared under the FDA’s SaMD exceeds over 500, only two of them – for assessing coronary artery disease and for diagnosing diabetic retinopathy – had over 10,000 CPT claims reimbursed in a four-year period ⁹.

The lack of suitable payment models for health AI tools has led to prioritizing solutions that offer financial over clinical benefits ¹⁶. In addition, AI developers face high costs driven by compute, data, and large enterprise healthcare sales teams. However, IT budgets are not large enough to sustain the payback assumptions made when investing in the creation of AI tools¹⁷. The total health IT spend in the US is approximately $46 billion, and approximately 10% of this spend is captured by leading electronic health record vendors ^16,18 which does not include the hardware and people needed to run them. The remaining budget includes software (both clinical and business systems), medical devices, imaging equipment, hardware and networking components, cybersecurity, and salaries for IT personnel, leaving little room to pay startups or incumbents creating AI tools. This results in immense pressure to find non-IT budgetary spend (such as re-allocating salaries and professional services) and for alignment with the way medical care is paid for. This tension has prompted a reevaluation of traditional business models—the overarching strategies companies use to create, deliver, and capture value from their solutions such as software or services—and pricing models, which describe the specific mechanisms by which vendors monetize their solutions, such as recurring subscription fees, pay per use, or contingency-based pricing ¹⁹.

Given these constraints, health AI vendors are developing pricing strategies that align with existing payment paradigms for technology or for medical care. These strategies attempt to cover the high upfront costs of AI implementation, the long-term additive costs associated with ‘augment the human’ design paradigms, and the uncoupling of technology users from who pays. The challenge becomes clear when we cross-tabulate the pricing strategies (usage- or performance-based) with the payment paradigm (for technology or medical care), as shown in the table below.

Table. Pricing Strategies for Health AI Tools. Examples of technology and medical AI tools that use either usage-based or performance-based pricing strategies.

	Technology	Medical
Usage-based	E.g. Ambient scribe	E.g. AI-based screening for diabetic retinopathy
Performance-based	E.g. AI-powered coding and clinical documentation integrity	E.g. Algorithm guided post-acute management

Usage-based pricing charges customers based on volume of utilization of an AI tool. The payment can be a direct payment by the customer (e.g., for a per-user license for ambient scribe) or via reimbursement (e.g., CPT code 92229 for AI-based screening for diabetic retinopathy). The first adds net-new cost to the IT budget while the second generates new revenue for providers but increases costs for payers and carries the risk of overuse via unnecessary screenings. When AI tools, like ambient scribes, do not generate revenue directly, costs are justified by indirect benefits, such as reduced physician burnout and potentially lower turnover, or downstream benefits such as better documentation for billing. However, in other instances, the expectation is that costs will be covered by having users of the technology see more patients in the time saved. If time is saved, physicians must decide whether to add a patient to their schedule or to keep their normal case load in hopes of providing better care²⁰. In some cases, patients bear the cost via out-of-pocket fees, such as with AI-enhanced mammography interpretations ²¹. Usage-based pricing can create conflicting incentives, with vendors promoting increased utilization to boost revenue while their customers may limit usage arbitrarily to control costs.

Performance-based pricing distributes financial risk between AI developers and healthcare providers or payers, with payments tied to measurable outcomes (not necessarily clinical outcomes). Risk-sharing arrangements range from a base annual fee plus a share of the financial savings to payment solely from created financial savings. For example, an AI-augmented screening program to detect worsening heart failure (HF) may allow for early intervention by the care provider, reducing readmissions²². There can be a base fee for access to the software with some percentage of additional revenue to the AI vendor generated by reducing a hospital’s readmission rates, and therefore decreasing the associated Medicare reimbursement penalty. Similarly, an AI-powered coding and clinical documentation integrity system can analyze clinical documentation to suggest appropriate diagnostic codes such as Risk Adjustment Factor (RAF) coding and increase compliance by identifying clinical documentation that may be insufficient to support accurate coding for billing. There can be a base fee for access to the software plus some percentage of additional revenue to the AI vendor generated from the improved coding accuracy.

Performance-based pricing has limitations. A vendor may be incentivized to assign codes that reflect higher illness severity than clinically justified, a problem called ‘upcoding’ that is often reported with tools used by payers offering Medicare Advantage plans²³. For example, FDA-cleared screening tools were allegedly used to add diagnosis codes to a patient record, even when no further care was rendered ²⁴. Performance-based pricing, when used to pay for AI-tools that decide on access to post-acute care based on patient needs²⁵, may override clinicians’ judgment and deny care for seniors to generate ‘cost savings’ in Medicare Advantage plans²⁶. A recent ProPublica investigation reported on the use of an algorithm backed by AI, which can be adjusted to lead to higher denials with the promise of saving $3 for every $1 spent on its use²⁷.

A Path Forward

The core issues with health AI tools stem from their inadequate evaluation and their decoupled regulatory and reimbursement approval criteria. These challenges are compounded by how technology and medical care are currently paid for. The resulting pricing arrangements reflect the machinations currently necessary to get paid for the use of AI in healthcare either via IT budgets or aligning with either fee-for-service or value-based care paradigms. However, these pricing paradigms reveal a concerning trend: AI can increase the total cost of care without improving healthcare quality, or worse, lead to care denials and possible care disparities in the quest to create “financial savings”. To navigate this situation, we make the following suggestions:

Conduct assessments to specify and verify benefits

Regardless of whether an AI tool is under FDA, ONC, or CMS regulation, before adoption, it is necessary to ensure that the use of the tool improves healthcare quality, either through more efficient operations, improved patient experience, or enhanced patient outcomes^28,29. We need robust estimates of benefits prior to deployment, and then verification of that benefit after deployment, in order to ensure that the use of AI tools improves overall value. For example, healthcare systems can put in place local evaluation regimens to ensure that the use of AI tools is fair, useful, reliable, and monetarily sustainable ²⁸. An upfront ethics evaluation to assess for unintended consequences is critical to avoid some of the situations detailed in the examples above³⁰. Additionally, impact assessments should examine financial sustainability for addressing the disconnect between regulation and current reimbursement for clinical AI tools ³¹. In situations where there is no short-term financial benefit – typically defined as return on investment in one year or less – there may be intangible benefits such as improved provider wellness³² or better long term patient outcomes. If estimated, these can form the foundation for advocating for the adoption of certain AI products. At the minimum, an upfront evaluation of fairness, usefulness, reliability, and monetary sustainability can prevent organizational waste in the form of pilotitis – where hundreds of pilot projects happen and none convert to a broad implementation ^33,34. Finally, given that hundreds of health AI tools have been approved on the basis of limited clinical data³⁵, there is a related urgent need to institute ongoing evaluation of health AI tools that are already in use³⁶.

Create consensus on mechanisms of transparent evaluation

Given that value from the use of an AI tool is notoriously difficult to define, and is an interplay of a tool’s performance with the care workflows in which it is used, it is necessary to evaluate both³⁷. The AI tools (or the underlying models) should be subject to certain manufacturing constraints – as is already the case with FDA-regulated AI tools. For those AI tools that are currently not regulated, consensus best practices are needed around the creation, testing, and reporting of AI tools^38,39. Given that best practices are typically offered in the form of checklists and reporting guides, adherence to them remains challenging⁴⁰. The necessary next step is to facilitate the routine use of these desiderata (as well as verification of a vendor’s adherence to them), which can initially be done via a nationwide network of assurance labs⁴¹ and can gradually be transitioned into assurance software that is widely shared for self-service use. For example, Epic Systems has already taken the first step in this direction, with two academic groups contributing code to the software^42,43. Finally, the construct of the tool in the context of its workflow can only be evaluated in the local setting, for which we need to create consensus assurance guidelines ⁴⁴, shared open-source software ⁴⁵ , as well as communities of practice (such as Health AI Partnership⁴⁶ and RAISE ⁴⁷) to develop implementation best practices and centers that can evaluate clinical effectiveness ⁴⁸. The creation of common, accepted practices can ensure the evaluation process of AI is as cost-efficient as possible.

Align AI implementation with appropriate business models

Strategic alignment of AI implementation with the right business model is crucial for cost-effectiveness and value creation in healthcare. The concept of modality-business model-market fit ⁴⁹, how the choice of the form AI takes and the business model to support it, determines the value potential of the resulting solution. By selecting appropriate modalities — such as AI-enabled software, copilots, diagnostic or therapeutic tools — and aligning them with appropriate business models, stakeholders can generate value without unnecessarily inflating costs. For example, AI copilots (ambient scribes being an example) integrated into existing workflows and funded by current budgets, can enhance efficiency without requiring significant infrastructural changes. Hospitals that already allocate resources for human scribes can reallocate that budget to IT for a transition to AI alternatives, improving consistency and quality of clinical documentation while operating within established budgets. An AI-agent conducting a post-discharge follow-up workflow⁵⁰, or an agent performing medication titration (such as a voice agent managing insulin dosage using data from a continuous glucose monitor⁵¹), can off-load work from burnt out and overworked nurse care managers, allowing reallocation of time to other tasks. Other modalities, such as the AI-augmented screening tool for heart failure⁵², may require the creation of new workflows in a value-based care setting, so that avoidance of later complications is prioritized in a population health setting.

The value created from AI in healthcare will depend on how well we balance technology innovation, the incentives created by the complex payment structures in healthcare, the payback expectations of those investing in technology creation, the business models adopted by AI vendors, measurable clinical benefit, and associated healthcare costs. We must balance appropriate oversight with flexible infrastructure that continues to support innovation. Current business models of AI tools that impact medical care are square pegs in the two round holes by which medical care is paid for – i.e. fee-for-service and value-based payment methods. For pure technology plays, the available IT budgets might be missing a zero or two. To bridge this gap, we need focused efforts to connect high quality evaluation of benefits, business model choice, regulation, and reimbursement for promising, high-value emerging technologies in order to finally achieve the promise of health IT lowering the cost of healthcare.

References

[1] Mullangi S, Ibrahim SA, Shah NH, Schulman KA. A Roadmap To Welcoming Health Care Innovation. Health Affairs Forefront. doi:10.1377/forefront.20191119.155490

[2] Center for Devices, Radiological Health. Software as a Medical Device (SaMD). U.S. Food and Drug Administration. Published August 9, 2024. https://www.fda.gov/medical-devices/digital-health-center-excellence/software-medical-device-samd. Accessed October 14, 2024

[3] Health Data, Technology, and Interoperability: Certification Program Updates, Algorithm Transparency, and Information Sharing. Federal Register. Published January 9, 2024. https://www.federalregister.gov/documents/2024/01/09/2023-28857/health-data-technology-and-interoperability-certification-program-updates-algorithm-transparency-and. Accessed October 14, 2024

[4] What is a DTx? Digital Therapeutics Alliance. Published September 15, 2022. https://dtxalliance.org/understanding-dtx/what-is-a-dtx/. Accessed October 14, 2024

[5] Wu E, Wu K, Daneshjou R, Ouyang D, Ho DE, Zou J. How medical AI devices are evaluated: limitations and recommendations from an analysis of FDA approvals. Nat Med. 2021;27(4):582-584. doi:10.1038/s41591-021-01312-x

[6] Chouffani El Fassi S, Abdullah A, Fang Y, Natarajan S, Masroor AB, Kayali N, et al. Not all AI health tools with regulatory authorization are clinically validated. Nat Med. Published online August 26, 2024:1-3. doi:10.1038/s41591-024-03203-3

[7] Lenharo M. The testing of AI in medicine is a mess. Here’s how it should be done. Nature Publishing Group UK. doi:10.1038/d41586-024-02675-0

[8] Lobig F, Subramanian D, Blankenburg M, Sharma A, Variyar A, Butler O. To pay or not to pay for artificial intelligence applications in radiology. NPJ digital medicine. 2023;6(1). doi:10.1038/s41746-023-00861-4

[9] Wu K, Wu E, Theodorou B, Liang W, Mack C, Glass L, et al. Characterizing the Clinical Adoption of Medical AI Devices through U.S. Insurance Claims. NEJM AI. Published online November 9, 2023. doi:10.1056/AIoa2300030

[10] Yetman D. The Cost of a Coronary Calcium Scan on Your Heart. Healthline. Published November 9, 2022. https://www.healthline.com/health/heart/coronary-calcium-scan-cost. Accessed October 19, 2024

[11] Website. https://www.cms.gov/medicare/payment/prospective-payment-systems/acute-inpatient-pps/new-medical-services-and-new-technologies