Clinical trials informed framework for real world clinical implementation and deployment of artificial intelligence applications

You, Jacqueline G.; Hernandez-Boussard, Tina; Pfeffer, Michael A.; Landman, Adam; Mishuris, Rebecca G.

doi:10.1038/s41746-025-01506-4

Download PDF

Perspective
Open access
Published: 17 February 2025

Clinical trials informed framework for real world clinical implementation and deployment of artificial intelligence applications

Jacqueline G. You^1,2,
Tina Hernandez-Boussard³,
Michael A. Pfeffer³,
Adam Landman^1,2 &
…
Rebecca G. Mishuris^1,2

npj Digital Medicine volume 8, Article number: 107 (2025) Cite this article

7718 Accesses
2 Citations
3 Altmetric
Metrics details

Subjects

Abstract

With rapidly evolving artificial intelligence solutions, healthcare organizations need an implementation roadmap. A “clinical trials” informed approach can promote safe and impactful implementation of artificial intelligence. This framework includes four phases: (1) Safety; (2) Efficacy; (3) Effectiveness and comparison to an existing standard; and (4) Monitoring. Combined with inter-institutional collaboration and national funding support, this approach will advance safe, usable, effective, and equitable deployments of artificial intelligence in healthcare.

To warrant clinical adoption AI models require a multi-faceted implementation evaluation

Article Open access 06 March 2024

Collaborative Intelligence to catalyze the digital transformation of healthcare

Article Open access 25 September 2023

Rethinking clinical trials for medical AI with dynamic deployments of adaptive systems

Article Open access 06 May 2025

Introduction

Artificial intelligence (AI) is transforming healthcare with an explosion in the number of studied AI applications in medicine¹. However, the adoption of AI into clinical practice has been slow and often fragmented, in part due to the significant regulatory, safety, and ethical challenges unique to healthcare AI. These challenges necessitate careful testing and collaborative development to ensure AI technologies are safe, ethical, private, equitable, user-friendly, and achieve their intended impact. Initial national and international regulatory actions acknowledge the need for prompt and timely guardrails around rapidly evolving AI technologies. Notable examples include the European Union AI Act^2,3; the US White House executive order on the Safe, Secure, and Trustworthy Development and Use of AI⁴; and, the healthcare-specific HTI-1 final rule from the US Department of Health and Human Services, the Assistant Secretary for Technology Policy, and Office of the National Coordinator for Health Information Technology (ONC), which mandates transparency for AI algorithms that are part of ONC-certified health information technology⁵. Likewise, others have proposed frameworks and best practices for governance of healthcare AI at the international, national⁶, and local level⁷, with safety, trust⁸, ethics^6,9, and equity^10,11,12,13 as core tenets of the responsible use of AI in healthcare.

Major frameworks and guidelines for AI focus on model evaluation rather than implementation, including IBM’s AI Lifecycle Management framework for industry AI pipelines¹⁴; SPIRIT-AI and CONSORT-AI guidelines for reporting on clinical trials involving AI^{15,16,17,18,19,20}; and the HTI-1 framework for AI transparency in healthcare⁵. The AI Lifecycle Management Framework typically focuses on the technical stages of AI model development, deployment, and maintenance but may lack specificity in tailoring these stages to healthcare’s unique safety and ethical requirements. SPIRIT-AI and CONSORT-AI provide guidance on reporting standards for clinical trials involving AI but do not offer a stepwise approach to implementation, monitoring, and scaling AI tools in clinical practice. Finally, while HTI-1 provides regulatory guidance for transparency and ethical AI use, it does not provide specific implementation stages for healthcare organizations to systematically validate and scale AI tools.

Guidance on implementation approach of solutions for pragmatic deployment in this space are arguably nascent; therefore, we advocate for a structured approach that is still rapid enough to impact clinical change in a reasonable timeframe but measured enough to yield defined outcomes and allow for scalability. Others have previously discussed practical concerns around scaling AI in healthcare²¹ and an approach to AI healthcare product development²². Here, we offer a framework for healthcare organizations to implement AI technologies safely and with impact, beyond scientific research, using in-house developed tools or vendor-based solutions.

Clinical trials framework

US Food and Drug Administration (FDA) clinical research has four phases that include: Phase 1: understanding safety/drug dosage in 20–100 healthy individuals or individuals with the disease; Phase 2: measuring efficacy and identifying side effects in hundreds of individuals with the disease; Phase 3: larger scale efficacy/benefit and monitoring of adverse reactions in 300–3000 individuals, often relative to standard of care; and Phase 4: post-FDA approval post-market monitoring of safety and efficacy in thousands of individuals²³. We propose a clinical trials informed framework for AI implementation in healthcare systems at scale, which mirrors the four phases of FDA-regulated clinical trials—safety, efficacy, effectiveness, and post-deployment monitoring—to systematically address critical concerns such as regulatory compliance, patient safety, and model validation. This approach ensures that AI solutions undergo rigorous validation at each stage, creating a foundation for safe and effective clinical integration. By following these stages, healthcare AI can transition more seamlessly from research settings into routine practice, minimizing risks while maximizing patient outcomes. This is a similar approach that the American Medical Informatics Association, the national organization of informatics professionals, researchers, and other members in the US, has taken with case studies of AI²⁴.

For this perspective, though we recognize that the term AI encompasses a wide range of systems, we focus on AI solutions based on machine learning or large language models, though these concepts may apply to other AI systems such as computer-interpretable clinical guidelines²⁵ and argumentation for medical decision-making²⁶. We also recognize that these principles may be most relevant in the US, which has different AI regulations and higher complexity of clinical administrative burden.

Our clinical trials informed framework to AI solution deployment in healthcare is intended for AI solutions that are not necessarily medical devices with four phases (Table 1) for implementation, not regulation. As these AI solutions are neither medical devices nor drugs, this framework includes pragmatic consideration of clinical workflows required in the informatics realm.

Table 1 Clinical Trials Framework for Artificial Intelligence Applications

Full size table

Phase 1: safety

This initial phase assesses the foundational safety of the AI model or tool. Here, models are deployed in a controlled, non-production setting, where they do not influence clinical decisions. Testing can be done retrospectively or in “silent mode,”²⁷ where predictions are observed without impacting patient care. For example, a large language model might be used in clinical trial screening by evaluating retrospective electronic health record (EHR) notes to determine patient eligibility without risking patient outcomes²⁸; the evaluation may also include validation/bias analyses to measure fairness across different patient demographics²⁹, ensuring the model does not inadvertently disadvantage specific groups.

Phase 2: efficacy

In the second phase, the model’s efficacy is examined prospectively under ideal conditions, often by integrating it into live clinical environments with limited visibility to clinical staff. This phase tests whether the AI can perform accurately and beneficially in real-time workflows. During this phase, models are typically run “in the background,” allowing them to process real-world data without impacting clinical decision-making until performance is thoroughly vetted. Teams begin to organize data pipelines so that hospital data can be input into the model and identify which team members (such as a nurse or physician) will act on data output at which steps in clinical workflows. Teams must determine timepoints to display data output and the means to deliver timely, interpretable output. Examples include using AI to predict admission rates in the emergency department, where results are hidden from end-users to refine accuracy without influencing care³⁰, and an AI-based acute coronary syndrome detection tool³¹, in which the ingestion of real-world data allowed for optimization of equity and fairness^32,33.

Phase 3: effectiveness and/or comparison to existing standard

In this phase, the AI tool is deployed more broadly, and its effectiveness is assessed relative to current standards of care. In contrast to phase 2, which focuses on efficacy, a measure of an outcome under ideal circumstances, phase 3 focuses on effectiveness, which is a measure of benefit in a pragmatic real-world clinical setting³⁴. This phase incorporates health outcome metrics, demonstrating real-world impact on patient care and clinician workflows. Implementation teams evaluate the model’s generalizability by testing it across various patient populations and clinical settings, measuring geographic and domain-specific performance²⁹. A real-world example is ambient documentation, a generative AI platform being piloted by Stanford and Mass General Brigham across multiple clinical specialties that converts patient-clinician conversations into draft notes, which are reviewed and edited by the clinician before signing in the EHR system. The quality and usability of these notes are being compared to notes written by the clinicians themselves, while the outcome measures of clinician experience and burnout are being assessed rigorously. An additional example is AI-generated inbasket draft replies, in which patient message content is sent securely to an EHR vendor’s OpenAI GPT-4³⁵ instance and a draft reply is generated, then edited by a clinic staff member (in our implementations, a clinical staff member such as a physician, APP, nurse, or pharmacist)³⁶. Time spent replying to inbasket messages is being assessed to determine whether the AI technology is impacting efficiency. Comparison between clinician and AI-generated draft replies in domains such as professionalism and tone are examples of evaluation that extend beyond traditional process outcomes³⁷.

Phase 4: monitoring-scaled and post-deployment surveillance

After scaled deployment, AI tools require ongoing surveillance to track performance, safety, and equity over time. Continuous monitoring identifies any drift in model performance, while user feedback helps maintain alignment with clinical needs and safety standards. This phase ensures that as AI models evolve or face data shifts, they are recalibrated to remain effective and unbiased. The integration of monitoring systems into routine workflows allows for rapid identification of adverse events or bias, supporting sustained model integrity in clinical practice. Systems to detect model drift³⁸ can inform model updates or de-implementation of ineffective AI solutions. Adopting existing methodology from traditional clinical decision support initiatives, such as override comments as a feedback mechanism for improving clinical decision support³⁹ and the Vanderbilt Clickbusters initiative which iteratively reviews clinical alerts to turn off unneeded alerts and improving or adding more targeted alerts⁴⁰, can help ensure better clinical uptake and intervention efficacy. In addition, teams should disseminate findings so that other institutions can learn and share best practices.

Deploying AI at scale in healthcare systems faces several challenges, particularly when it comes to aligning AI-generated guidance between specialty practices and primary care. Stanford has evaluated patient prediction models that pose many challenges³². One major issue is a mismatch in recommendations, in which AI models trained in specialty settings may not perform well with primary care workflows or guidelines. Furthermore, lack of coverage and reimbursement for certain tests or treatments recommended by AI may limit usage in real-world practice. Additionally, healthcare populations are often fragmented across multiple practices, with a third of patients in the United States lacking a primary care provider⁴¹. This fragmentation complicates patient management and follow-up, as compliance with AI-suggested interventions may fall through the cracks. This monitoring phase requires ongoing model assessment, feedback loops, and potential recalibration, which can be logistically complex.

Discussion

Using a clinical trials framework for healthcare AI provides a pragmatic, structured, stepwise approach to evaluating and scaling novel AI solutions in care delivery. This framework emphasizes patient safety, efficacy and real-world applicability. By mirroring the rigorous processes of traditional clinical trials, this framework offers a robust path to validate AI tools comprehensively, ensuring these technologies benefit diverse populations without introducing unintended risks. Also, this approach addresses the unique challenges of healthcare AI, including regulatory variability, ethical considerations, model drift, and data generalizability, while emphasizing continuous monitoring to sustain model integrity over time. Other healthcare-focused frameworks such as SPIRIT-AI/CONSORT-AI focus more on reporting standards or regulatory guidance (such as HTI-1).

In the US, while external clinical decision support may be considered a medical device⁴² and potentially be subject to formal FDA review⁴³, healthcare organizations can deploy in-house AI models without FDA certification, allowing for significant flexibility in internal clinical use. The need for rigorous and often prolonged evaluation of external solutions subsequently limits immediate market availability. This regulatory flexibility contrasts with requirements in other jurisdictions, where most clinical AI tools must be certified before use. Addressing such regulatory variation is essential for ensuring the framework’s applicability across global healthcare settings, balancing flexibility for internal use with structured validation for external deployment.

This framework may be less applicable for AI applications in broader healthcare settings, such as public health or community health programs, where direct clinical workflow integration is not always feasible or necessary. In addition, the clinical trials approach to AI-based healthcare technologies may not be applicable for small- to medium-sized healthcare organizations which may implement these tools once they have already reached the ongoing monitoring stage. Analogous to traditional bench or clinical research, these AI clinical trials are more likely to occur at larger academic medical centers, as they require resources, financial investment, and AI-specific technical expertise⁴⁴. However, while large academic medical centers are likely to lead these efforts, it is crucial that the lessons learned from these initiatives are shared across all healthcare communities, including community healthcare centers and safety net hospitals. By disseminating knowledge and best practices, we can ensure that all populations benefit from safe, effective, and equitable AI solutions.

We recognize that there are distinct challenges to monitoring AI-based technologies in healthcare that may limit some generalizability of findings. For example, with ambient documentation, our institutions have observed differences in configurations, underlying large language models, device support, and EHR integration across different vendors, compounded by rapid platform feature and model changes in a competitive vendor market. On the implementation side, institutions launch ambient documentation technology with different user specialties and different numbers of users. Standardized benchmarks and metrics may help mitigate some of this variability in experience and performance. For example, in Phase 2 of our framework, test case libraries for regular validation (test messages, standardized recordings) could periodically be used by vendors to monitor performance.

When deploying AI in healthcare, it is essential to prioritize outcomes and safety rather than solely focus on process measures and model performance, as we highlight in Phase 3. While metrics such as AI drafted note accuracy or draft reply generation times are important, they do not fully capture the real-world impact of AI on patient care. AI solutions must demonstrate that they improve health outcomes, reduce harm, and contribute to better overall patient experiences⁴⁵. Emphasizing patient safety from Phases 1-4 ensures that AI solutions are used responsibly, minimizing the risk of unintended consequences like exacerbating health disparities or introducing bias. By shifting the focus toward meaningful outcomes, especially determining the equity impact of AI solutions at different levels of health ranging from individual to population-level health¹³, healthcare systems can better assess the true value of AI solutions and ensure they are enhancing care in ways that align with the broader goals of equity and quality improvement.

Each of the phases we highlight relies on the availability of high-quality, diverse datasets for testing and validation. However, data quality and representation issues can vary widely, particularly in underrepresented patient groups, which could limit the framework’s effectiveness in promoting equitable AI. More diverse, cross-institutional data will allow us to test the fairness and generalizability of the AI solutions we develop, which should be evaluated in Phase 2 of our framework. While the specifics of how institutions should approach implementation of these technologies can be debated, it is also clear that there is the need for greater regulatory guidance on using these technologies, echoing other calls for a careful approach that recognizes the unique challenges of generative AI⁴⁶, with the input of aforementioned stakeholders, as well as better systems and regulations to enable more federated cross-institutional pooling of data to improve performance of these tools.

We advocate that there is a pressing need for broad stakeholder engagement, governmental support (e.g., NIH funding) and industry sponsorship to rigorously and systematically study AI technologies, thereby enabling novel AI solutions to be validated and scalable across healthcare systems. Groups like the MIT Task Work on the Work of the Future⁴⁷, the Coalition for Health AI (CHAI)⁴⁸, and other more solution-specific interinstitutional collaborations can provide shared lessons. MGB and Stanford are both part of CHAI. MGB is part of the Ambient Clinical Documentation Collaborative, a group of academic medical centers implementing ambient documentation, to share insights and “invent the wheel” together on ambient documentation. Stanford plays a lead role in many of these organizations, as well as promoting local initiatives, such as Responsible AI for Safe and Equitable Health (RAISE Health)⁴⁹ and Human-Centered AI⁵⁰.

Finally, as informatics and healthcare system leaders construct and implement AI for pragmatic use in clinical and administrative workflows, teams must consider a solution’s financial viability during early planning stages. While AI offers alluring potential, it may not be appropriate for answering a specific question or solving a specific problem if cost becomes unsustainable. Cost considerations not only include initial technical cost for building the AI solution, but also cost related to uptake, training of staff, trust-building with communities regarding safe and equitable healthcare AI applications, and maintenance of these solutions; cost should subsequently be weighed against return on investment. Cost should be factored in with pragmatic outcomes, patient-oriented outcomes, or other meaningful outcomes to justify testing and scaling the technology. This mindset will prevent unnecessary reiteration of pilots that do not necessarily yield scalable, financially tenable solutions.

Importantly, we recognize that healthcare AI is a rapidly evolving field, and the framework may require adaptation across international regulatory environments and differing clinical settings. By sharing implementation insights and best practices, particularly from early adopters, we aim to support broader, equitable adoption of AI tools across all healthcare environments, from large academic centers to community hospitals. Ultimately, this framework provides a pathway for safe and effective AI in healthcare, aligning technological advancement with the goals of patient-centered outcomes, equity, and long-term societal benefit.

In conclusion, while AI holds promise for transforming healthcare, its deployment must be approached with caution and rigor. Adopting a clinical trials framework ensures that AI solutions are thoroughly tested for safety, efficacy, and effectiveness before widespread implementation. Teams should measure patient outcomes, safety, and equity rather than solely focusing on process improvements or model performance. By sharing lessons learned from early adopters, including academic medical centers, across all healthcare settings, we can ensure that AI solutions are both effective and equitable, benefiting diverse populations and improving the quality of care for all.

Data availability

No datasets were generated or analysed during the current study.

References

Mesko, B. & Görög, M. A short guide for medical professionals in the era of artificial intelligence. npj Digit. Med. 3, 126 (2020).
Article PubMed PubMed Central Google Scholar
European Parliament. EU AI Act: first regulation on artificial intelligence. European Parliament https://www.europarl.europa.eu/topics/en/article/20230601STO93804/eu-ai-act-first-regulation-on-artificial-intelligence (2024).
Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024 Laying down Harmonised Rules on Artificial Intelligence and Amending Regulations (EC) No 300/2008, (EU) No 167/2013, (EU) No 168/2013, (EU) 2018/858, (EU) 2018/1139 and (EU) 2019/2144 and Directives 2014/90/EU, (EU) 2016/797 and (EU) 2020/1828 (Artificial Intelligence Act). (2024).
The White House. Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence (2023).
Office of the National Coordinator for Health Information Technology, Department of Health and Human Services. Health Data, Technology, and Interoperability: Certification Program Updates, Algorithm Transparency, and Information Sharing. 45 CFR § 170, 171 (2024).
World Health Organization. Executive Summary: Ethics and Governance of Artificial Intelligence for Health (2021).
Reddy, S., Allan, S., Coghlan, S. & Cooper, P. A governance model for the application of AI in health care. J. Am. Med. Inform. Assoc. 27, 491–497 (2020).
Article PubMed Google Scholar
Labkoff, S. et al. Toward a responsible future: recommendations for AI-enabled clinical decision support. J. Am. Med. Inform. Assoc. 31, 2730–2739 (2024).
Article PubMed PubMed Central Google Scholar
Li, F., Ruijs, N. & Lu, Y. Ethics & AI: A systematic review on ethical concerns and related strategies for designing with AI in. Healthc. AI 4, 28–53 (2023).
Google Scholar
Garba-Sani, Z., Farinacci-Roberts, C., Essien, A. & Yracheta, J. M. A.C.C.E.S.S. AI: a new framework for advancing health equity in health care AI. Health Affairs https://doi.org/10.1377/forefront.20240424.369302 (2024).
Dankwa-Mullan, I. et al. A Proposed framework on integrating health equity and racial justice into the artificial intelligence development lifecycle. J. Health Care Poor Underserved 32, 300–317 (2021).
Article Google Scholar
Clark, C. R. et al. Health care equity in the use of advanced analytics and artificial intelligence technologies in primary care. J. Gen. Intern. Med. 36, 3188–3193 (2021).
Article PubMed PubMed Central Google Scholar
Rodriguez, J. A., Alsentzer, E. & Bates, D. W. Leveraging large language models to foster equity in healthcare. J. Am. Med. Inform. Assoc. 31, 2147–2150 (2024).
Article PubMed Google Scholar
Ishizaki, K. AI model lifecycle management: overview. IBM https://www.ibm.com/think/topics/ai-lifecycle (2020).
Cruz Rivera, S., Liu, X., Chan, A.-W., Denniston, A. K. & Calvert, M. J. Guidelines for clinical trial protocols for interventions involving artificial intelligence: the SPIRIT-AI extension. Lancet Digit. Health 2, e549–e560 (2020).
Article PubMed Google Scholar
Cruz Rivera, S., Liu, X., Chan, A.-W., Denniston, A. K. & Calvert, M. J. Guidelines for clinical trial protocols for interventions involving artificial intelligence: the SPIRIT-AI extension. Nat. Med. 26, 1351–1363 (2020).
Article CAS PubMed PubMed Central Google Scholar
Rivera, S. C., Liu, X., Chan, A.-W., Denniston, A. K. & Calvert, M. J. Guidelines for clinical trial protocols for interventions involving artificial intelligence: the SPIRIT-AI extension. BMJ 370, m3210 (2020).
Article PubMed PubMed Central Google Scholar
Liu, X., Cruz Rivera, S., Moher, D., Calvert, M. J. & Denniston, A. K. Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI extension. Lancet Digit Health 2, e537–e548 (2020).
Article PubMed PubMed Central Google Scholar
Liu, X., Rivera, S. C., Moher, D., Calvert, M. J. & Denniston, A. K. Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI extension. BMJ 370, m3164 (2020).
Article PubMed PubMed Central Google Scholar
Liu, X., Cruz Rivera, S., Moher, D., Calvert, M. J. & Denniston, A. K. Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI extension. Nat. Med 26, 1364–1374 (2020).
Article CAS PubMed PubMed Central Google Scholar
Esmaeilzadeh, P. Challenges and strategies for wide-scale artificial intelligence (AI) deployment in healthcare practices: a perspective for healthcare organizations. Artif. Intell. Med. 151, 102861 (2024).
Article PubMed Google Scholar
Higgins, D. & Madai, V. I. From bit to bedside: a practical framework for artificial intelligence product development in healthcare. Adv. Intell. Syst. 2, 2000052 (2020).
Article Google Scholar
Food and Drug Administration. Step 3: clinical research. FDA https://www.fda.gov/patients/drug-development-process/step-3-clinical-research (2018).
American Medical Informatics Association. AMIA 2024 artificial intelligence evaluation showcase. AMIA https://amia.org/education-events/amia-2024-artificial-intelligence-evaluation-showcase (2024).
Peleg, M. Computer-interpretable clinical guidelines: a methodological review. J. Biomed. Inf. 46, 744–763 (2013).
Article Google Scholar
Fox, J. Cognitive systems at the point of care: The CREDO program. J. Biomed. Inform. 68, 83–95 (2017).
Article PubMed Google Scholar
Nestor, B. et al. Preparing a clinical support model for silent mode in general internal medicine. In Proceedings of the 5th Machine Learning for Healthcare Conference Vol. 126 (eds Doshi-Velez, F. et al.) 950–972 (PMLR, 2020).
Unlu, O. et al. Retrieval-augmented generation–enabled GPT-4 for clinical trial screening. NEJM AI 1, AIoa2400181 (2024).
Article Google Scholar
de Hond, A. A. H. et al. Perspectives on validation of clinical predictive algorithms. npj Digit. Med. 6, 86 (2023).
Article PubMed PubMed Central Google Scholar
Dadabhoy, F. Z. et al. Prospective external validation of a commercial model predicting the likelihood of inpatient admission from the emergency department. Ann. Emerg. Med. 81, 738–748 (2023).
Article PubMed Google Scholar
Bunney, G. et al. Beyond chest pain: Incremental value of other variables to identify patients for an early ECG. Am. J. Emerg. Med. 67, 70–78 (2023).
Article PubMed Google Scholar
Callahan, A. et al. Standing on FURM ground: a framework for evaluating fair, useful, and reliable AI models in health care systems. NEJM Catal. 5, CAT.24.0131 (2024).
Röösli, E., Bozkurt, S. & Hernandez-Boussard, T. Peeking into a black box, the fairness and generalizability of a MIMIC-III benchmarking model. Sci. Data 9, 24 (2022).
Article PubMed PubMed Central Google Scholar
Agency for Healthcare Research and Quality (US). Criteria for Distinguishing Effectiveness from Efficacy Trials in Systematic Reviews. https://www.ncbi.nlm.nih.gov/books/NBK44029/ (2006).
Achiam, O. J. et al. GPT-4 Technical Report. in. https://doi.org/10.48550/arXiv.2303.08774 (2023).
Garcia, P. et al. Artificial intelligence–generated draft replies to patient inbox messages. JAMA Netw. Open 7, e243201–e243201 (2024).
Article PubMed PubMed Central Google Scholar
Small, W. R. et al. Large language model–based responses to patients’ in-basket messages. JAMA Netw. Open 7, e2422399–e2422399 (2024).
Article PubMed PubMed Central Google Scholar
Davis, S. E., Greevy, R. A. J., Lasko, T. A., Walsh, C. G. & Matheny, M. E. Detection of calibration drift in clinical prediction models to inform model updating. J. Biomed. Inf. 112, 103611 (2020).
Article Google Scholar
Aaron, S., McEvoy, D. S., Ray, S., Hickman, T.-T. T. & Wright, A. Cranky comments: detecting clinical decision support malfunctions through free-text override reasons. J. Am. Med. Inform. Assoc. 26, 37–43 (2019).
Article PubMed Google Scholar
McCoy, A. B. et al. Clinician collaboration to improve clinical decision support: the Clickbusters initiative. J. Am. Med. Inform. Assoc. 29, 1050–1059 (2022).
Article PubMed PubMed Central Google Scholar
National Association of Community Health Centers & American Academy of Family Physicians. Closing the Primary Care Gap: How Community Health Centers Can Address the Nation’s Primary Care Crisis. https://www.nachc.org/wp-content/uploads/2023/06/Closing-the-Primary-Care-Gap_Full-Report_2023_digital-final.pdf (2023).
Your Clinical Decision Support Software: Is It a Medical Device? US Food and Drug Administration https://www.fda.gov/medical-devices/software-medical-device-samd/your-clinical-decision-support-software-it-medical-device (2022).
The Device Development Process. US Food and Drug Administration https://www.fda.gov/patients/learn-about-drug-and-device-approvals/device-development-process (2018).
Longhurst, C. A., Singh, K., Chopra, A., Atreja, A. & Brownstein, J. S. A call for artificial intelligence implementation science centers to evaluate clinical effectiveness. NEJM AI 1, AIp2400223 (2024).
Article Google Scholar
Chin, M. H. et al. Guiding principles to address the impact of algorithm bias on racial and ethnic disparities in health and health care. JAMA Netw. Open 6, e2345050–e2345050 (2023).
Article PubMed PubMed Central Google Scholar
Meskó, B. & Topol, E. J. The imperative for regulatory oversight of large language models (or generative AI) in healthcare. npj Digit. Med. 6, 120 (2023).
Article PubMed PubMed Central Google Scholar
MIT Work of the Future. About Us. MIT Work of the Future https://workofthefuture-taskforce.mit.edu/mission/ (2024).
Coalition for Health AI, Inc. Our Purpose. CHAI https://chai.org/our-purpose/ (2024).
Stanford Medicine. Responsible AI for Safe and Equitable Health. RAISE Health https://med.stanford.edu/raisehealth (2024).
Stanford University. Stanford University Human-Centered Artificial Intelligence. https://hai.stanford.edu/ (2024).

Download references

Acknowledgements

J.G.Y. is supported by National Library of Medicine/National Institutes of Health grant [T15LM007092].Research reported in this publication was supported by the National Center for Advancing Translational Sciences of the National Institutes of Health under Award Number UL1TR003142. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Author information

Authors and Affiliations

Mass General Brigham, Somerville, MA, USA
Jacqueline G. You, Adam Landman & Rebecca G. Mishuris
Harvard Medical School, Boston, MA, USA
Jacqueline G. You, Adam Landman & Rebecca G. Mishuris
Stanford University School of Medicine, Stanford, CA, USA
Tina Hernandez-Boussard & Michael A. Pfeffer

Authors

Jacqueline G. You
View author publications
Search author on:PubMed Google Scholar
Tina Hernandez-Boussard
View author publications
Search author on:PubMed Google Scholar
Michael A. Pfeffer
View author publications
Search author on:PubMed Google Scholar
Adam Landman
View author publications
Search author on:PubMed Google Scholar
Rebecca G. Mishuris
View author publications
Search author on:PubMed Google Scholar

Contributions

T.H.B., A.L., and R.G.M. conceptualized the Perspective. J.G.Y. wrote the initial draft. T.H.B., M.A.P., A.L., and R.G.M. contributed to the first draft and provided critical revisions. All authors have read, reviewed, and approved of the final manuscript.

Corresponding author

Correspondence to Jacqueline G. You.

Ethics declarations

Competing interests

A.L. is a consultant for the Abbott Medical Device Cybersecurity Council. The other authors have no competing financial or non-financial interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

You, J.G., Hernandez-Boussard, T., Pfeffer, M.A. et al. Clinical trials informed framework for real world clinical implementation and deployment of artificial intelligence applications. npj Digit. Med. 8, 107 (2025). https://doi.org/10.1038/s41746-025-01506-4

Download citation

Received: 28 September 2024
Accepted: 09 February 2025
Published: 17 February 2025
DOI: https://doi.org/10.1038/s41746-025-01506-4

This article is cited by

Technoethics in real life: AI as a core clinical competency
- Elena Giovanna Bignami
- Michele Russo
- Valentina Bellini
Journal of Anesthesia, Analgesia and Critical Care (2025)