Abstract
With rapidly evolving artificial intelligence solutions, healthcare organizations need an implementation roadmap. A “clinical trials” informed approach can promote safe and impactful implementation of artificial intelligence. This framework includes four phases: (1) Safety; (2) Efficacy; (3) Effectiveness and comparison to an existing standard; and (4) Monitoring. Combined with inter-institutional collaboration and national funding support, this approach will advance safe, usable, effective, and equitable deployments of artificial intelligence in healthcare.
Similar content being viewed by others
Introduction
Artificial intelligence (AI) is transforming healthcare with an explosion in the number of studied AI applications in medicine1. However, the adoption of AI into clinical practice has been slow and often fragmented, in part due to the significant regulatory, safety, and ethical challenges unique to healthcare AI. These challenges necessitate careful testing and collaborative development to ensure AI technologies are safe, ethical, private, equitable, user-friendly, and achieve their intended impact. Initial national and international regulatory actions acknowledge the need for prompt and timely guardrails around rapidly evolving AI technologies. Notable examples include the European Union AI Act2,3; the US White House executive order on the Safe, Secure, and Trustworthy Development and Use of AI4; and, the healthcare-specific HTI-1 final rule from the US Department of Health and Human Services, the Assistant Secretary for Technology Policy, and Office of the National Coordinator for Health Information Technology (ONC), which mandates transparency for AI algorithms that are part of ONC-certified health information technology5. Likewise, others have proposed frameworks and best practices for governance of healthcare AI at the international, national6, and local level7, with safety, trust8, ethics6,9, and equity10,11,12,13 as core tenets of the responsible use of AI in healthcare.
Major frameworks and guidelines for AI focus on model evaluation rather than implementation, including IBM’s AI Lifecycle Management framework for industry AI pipelines14; SPIRIT-AI and CONSORT-AI guidelines for reporting on clinical trials involving AI15,16,17,18,19,20; and the HTI-1 framework for AI transparency in healthcare5. The AI Lifecycle Management Framework typically focuses on the technical stages of AI model development, deployment, and maintenance but may lack specificity in tailoring these stages to healthcare’s unique safety and ethical requirements. SPIRIT-AI and CONSORT-AI provide guidance on reporting standards for clinical trials involving AI but do not offer a stepwise approach to implementation, monitoring, and scaling AI tools in clinical practice. Finally, while HTI-1 provides regulatory guidance for transparency and ethical AI use, it does not provide specific implementation stages for healthcare organizations to systematically validate and scale AI tools.
Guidance on implementation approach of solutions for pragmatic deployment in this space are arguably nascent; therefore, we advocate for a structured approach that is still rapid enough to impact clinical change in a reasonable timeframe but measured enough to yield defined outcomes and allow for scalability. Others have previously discussed practical concerns around scaling AI in healthcare21 and an approach to AI healthcare product development22. Here, we offer a framework for healthcare organizations to implement AI technologies safely and with impact, beyond scientific research, using in-house developed tools or vendor-based solutions.
Clinical trials framework
US Food and Drug Administration (FDA) clinical research has four phases that include: Phase 1: understanding safety/drug dosage in 20–100 healthy individuals or individuals with the disease; Phase 2: measuring efficacy and identifying side effects in hundreds of individuals with the disease; Phase 3: larger scale efficacy/benefit and monitoring of adverse reactions in 300–3000 individuals, often relative to standard of care; and Phase 4: post-FDA approval post-market monitoring of safety and efficacy in thousands of individuals23. We propose a clinical trials informed framework for AI implementation in healthcare systems at scale, which mirrors the four phases of FDA-regulated clinical trials—safety, efficacy, effectiveness, and post-deployment monitoring—to systematically address critical concerns such as regulatory compliance, patient safety, and model validation. This approach ensures that AI solutions undergo rigorous validation at each stage, creating a foundation for safe and effective clinical integration. By following these stages, healthcare AI can transition more seamlessly from research settings into routine practice, minimizing risks while maximizing patient outcomes. This is a similar approach that the American Medical Informatics Association, the national organization of informatics professionals, researchers, and other members in the US, has taken with case studies of AI24.
For this perspective, though we recognize that the term AI encompasses a wide range of systems, we focus on AI solutions based on machine learning or large language models, though these concepts may apply to other AI systems such as computer-interpretable clinical guidelines25 and argumentation for medical decision-making26. We also recognize that these principles may be most relevant in the US, which has different AI regulations and higher complexity of clinical administrative burden.
Our clinical trials informed framework to AI solution deployment in healthcare is intended for AI solutions that are not necessarily medical devices with four phases (Table 1) for implementation, not regulation. As these AI solutions are neither medical devices nor drugs, this framework includes pragmatic consideration of clinical workflows required in the informatics realm.
Phase 1: safety
This initial phase assesses the foundational safety of the AI model or tool. Here, models are deployed in a controlled, non-production setting, where they do not influence clinical decisions. Testing can be done retrospectively or in “silent mode,”27 where predictions are observed without impacting patient care. For example, a large language model might be used in clinical trial screening by evaluating retrospective electronic health record (EHR) notes to determine patient eligibility without risking patient outcomes28; the evaluation may also include validation/bias analyses to measure fairness across different patient demographics29, ensuring the model does not inadvertently disadvantage specific groups.
Phase 2: efficacy
In the second phase, the model’s efficacy is examined prospectively under ideal conditions, often by integrating it into live clinical environments with limited visibility to clinical staff. This phase tests whether the AI can perform accurately and beneficially in real-time workflows. During this phase, models are typically run “in the background,” allowing them to process real-world data without impacting clinical decision-making until performance is thoroughly vetted. Teams begin to organize data pipelines so that hospital data can be input into the model and identify which team members (such as a nurse or physician) will act on data output at which steps in clinical workflows. Teams must determine timepoints to display data output and the means to deliver timely, interpretable output. Examples include using AI to predict admission rates in the emergency department, where results are hidden from end-users to refine accuracy without influencing care30, and an AI-based acute coronary syndrome detection tool31, in which the ingestion of real-world data allowed for optimization of equity and fairness32,33.
Phase 3: effectiveness and/or comparison to existing standard
In this phase, the AI tool is deployed more broadly, and its effectiveness is assessed relative to current standards of care. In contrast to phase 2, which focuses on efficacy, a measure of an outcome under ideal circumstances, phase 3 focuses on effectiveness, which is a measure of benefit in a pragmatic real-world clinical setting34. This phase incorporates health outcome metrics, demonstrating real-world impact on patient care and clinician workflows. Implementation teams evaluate the model’s generalizability by testing it across various patient populations and clinical settings, measuring geographic and domain-specific performance29. A real-world example is ambient documentation, a generative AI platform being piloted by Stanford and Mass General Brigham across multiple clinical specialties that converts patient-clinician conversations into draft notes, which are reviewed and edited by the clinician before signing in the EHR system. The quality and usability of these notes are being compared to notes written by the clinicians themselves, while the outcome measures of clinician experience and burnout are being assessed rigorously. An additional example is AI-generated inbasket draft replies, in which patient message content is sent securely to an EHR vendor’s OpenAI GPT-435 instance and a draft reply is generated, then edited by a clinic staff member (in our implementations, a clinical staff member such as a physician, APP, nurse, or pharmacist)36. Time spent replying to inbasket messages is being assessed to determine whether the AI technology is impacting efficiency. Comparison between clinician and AI-generated draft replies in domains such as professionalism and tone are examples of evaluation that extend beyond traditional process outcomes37.
Phase 4: monitoring-scaled and post-deployment surveillance
After scaled deployment, AI tools require ongoing surveillance to track performance, safety, and equity over time. Continuous monitoring identifies any drift in model performance, while user feedback helps maintain alignment with clinical needs and safety standards. This phase ensures that as AI models evolve or face data shifts, they are recalibrated to remain effective and unbiased. The integration of monitoring systems into routine workflows allows for rapid identification of adverse events or bias, supporting sustained model integrity in clinical practice. Systems to detect model drift38 can inform model updates or de-implementation of ineffective AI solutions. Adopting existing methodology from traditional clinical decision support initiatives, such as override comments as a feedback mechanism for improving clinical decision support39 and the Vanderbilt Clickbusters initiative which iteratively reviews clinical alerts to turn off unneeded alerts and improving or adding more targeted alerts40, can help ensure better clinical uptake and intervention efficacy. In addition, teams should disseminate findings so that other institutions can learn and share best practices.
Deploying AI at scale in healthcare systems faces several challenges, particularly when it comes to aligning AI-generated guidance between specialty practices and primary care. Stanford has evaluated patient prediction models that pose many challenges32. One major issue is a mismatch in recommendations, in which AI models trained in specialty settings may not perform well with primary care workflows or guidelines. Furthermore, lack of coverage and reimbursement for certain tests or treatments recommended by AI may limit usage in real-world practice. Additionally, healthcare populations are often fragmented across multiple practices, with a third of patients in the United States lacking a primary care provider41. This fragmentation complicates patient management and follow-up, as compliance with AI-suggested interventions may fall through the cracks. This monitoring phase requires ongoing model assessment, feedback loops, and potential recalibration, which can be logistically complex.
Discussion
Using a clinical trials framework for healthcare AI provides a pragmatic, structured, stepwise approach to evaluating and scaling novel AI solutions in care delivery. This framework emphasizes patient safety, efficacy and real-world applicability. By mirroring the rigorous processes of traditional clinical trials, this framework offers a robust path to validate AI tools comprehensively, ensuring these technologies benefit diverse populations without introducing unintended risks. Also, this approach addresses the unique challenges of healthcare AI, including regulatory variability, ethical considerations, model drift, and data generalizability, while emphasizing continuous monitoring to sustain model integrity over time. Other healthcare-focused frameworks such as SPIRIT-AI/CONSORT-AI focus more on reporting standards or regulatory guidance (such as HTI-1).
In the US, while external clinical decision support may be considered a medical device42 and potentially be subject to formal FDA review43, healthcare organizations can deploy in-house AI models without FDA certification, allowing for significant flexibility in internal clinical use. The need for rigorous and often prolonged evaluation of external solutions subsequently limits immediate market availability. This regulatory flexibility contrasts with requirements in other jurisdictions, where most clinical AI tools must be certified before use. Addressing such regulatory variation is essential for ensuring the framework’s applicability across global healthcare settings, balancing flexibility for internal use with structured validation for external deployment.
This framework may be less applicable for AI applications in broader healthcare settings, such as public health or community health programs, where direct clinical workflow integration is not always feasible or necessary. In addition, the clinical trials approach to AI-based healthcare technologies may not be applicable for small- to medium-sized healthcare organizations which may implement these tools once they have already reached the ongoing monitoring stage. Analogous to traditional bench or clinical research, these AI clinical trials are more likely to occur at larger academic medical centers, as they require resources, financial investment, and AI-specific technical expertise44. However, while large academic medical centers are likely to lead these efforts, it is crucial that the lessons learned from these initiatives are shared across all healthcare communities, including community healthcare centers and safety net hospitals. By disseminating knowledge and best practices, we can ensure that all populations benefit from safe, effective, and equitable AI solutions.
We recognize that there are distinct challenges to monitoring AI-based technologies in healthcare that may limit some generalizability of findings. For example, with ambient documentation, our institutions have observed differences in configurations, underlying large language models, device support, and EHR integration across different vendors, compounded by rapid platform feature and model changes in a competitive vendor market. On the implementation side, institutions launch ambient documentation technology with different user specialties and different numbers of users. Standardized benchmarks and metrics may help mitigate some of this variability in experience and performance. For example, in Phase 2 of our framework, test case libraries for regular validation (test messages, standardized recordings) could periodically be used by vendors to monitor performance.
When deploying AI in healthcare, it is essential to prioritize outcomes and safety rather than solely focus on process measures and model performance, as we highlight in Phase 3. While metrics such as AI drafted note accuracy or draft reply generation times are important, they do not fully capture the real-world impact of AI on patient care. AI solutions must demonstrate that they improve health outcomes, reduce harm, and contribute to better overall patient experiences45. Emphasizing patient safety from Phases 1-4 ensures that AI solutions are used responsibly, minimizing the risk of unintended consequences like exacerbating health disparities or introducing bias. By shifting the focus toward meaningful outcomes, especially determining the equity impact of AI solutions at different levels of health ranging from individual to population-level health13, healthcare systems can better assess the true value of AI solutions and ensure they are enhancing care in ways that align with the broader goals of equity and quality improvement.
Each of the phases we highlight relies on the availability of high-quality, diverse datasets for testing and validation. However, data quality and representation issues can vary widely, particularly in underrepresented patient groups, which could limit the framework’s effectiveness in promoting equitable AI. More diverse, cross-institutional data will allow us to test the fairness and generalizability of the AI solutions we develop, which should be evaluated in Phase 2 of our framework. While the specifics of how institutions should approach implementation of these technologies can be debated, it is also clear that there is the need for greater regulatory guidance on using these technologies, echoing other calls for a careful approach that recognizes the unique challenges of generative AI46, with the input of aforementioned stakeholders, as well as better systems and regulations to enable more federated cross-institutional pooling of data to improve performance of these tools.
We advocate that there is a pressing need for broad stakeholder engagement, governmental support (e.g., NIH funding) and industry sponsorship to rigorously and systematically study AI technologies, thereby enabling novel AI solutions to be validated and scalable across healthcare systems. Groups like the MIT Task Work on the Work of the Future47, the Coalition for Health AI (CHAI)48, and other more solution-specific interinstitutional collaborations can provide shared lessons. MGB and Stanford are both part of CHAI. MGB is part of the Ambient Clinical Documentation Collaborative, a group of academic medical centers implementing ambient documentation, to share insights and “invent the wheel” together on ambient documentation. Stanford plays a lead role in many of these organizations, as well as promoting local initiatives, such as Responsible AI for Safe and Equitable Health (RAISE Health)49 and Human-Centered AI50.
Finally, as informatics and healthcare system leaders construct and implement AI for pragmatic use in clinical and administrative workflows, teams must consider a solution’s financial viability during early planning stages. While AI offers alluring potential, it may not be appropriate for answering a specific question or solving a specific problem if cost becomes unsustainable. Cost considerations not only include initial technical cost for building the AI solution, but also cost related to uptake, training of staff, trust-building with communities regarding safe and equitable healthcare AI applications, and maintenance of these solutions; cost should subsequently be weighed against return on investment. Cost should be factored in with pragmatic outcomes, patient-oriented outcomes, or other meaningful outcomes to justify testing and scaling the technology. This mindset will prevent unnecessary reiteration of pilots that do not necessarily yield scalable, financially tenable solutions.
Importantly, we recognize that healthcare AI is a rapidly evolving field, and the framework may require adaptation across international regulatory environments and differing clinical settings. By sharing implementation insights and best practices, particularly from early adopters, we aim to support broader, equitable adoption of AI tools across all healthcare environments, from large academic centers to community hospitals. Ultimately, this framework provides a pathway for safe and effective AI in healthcare, aligning technological advancement with the goals of patient-centered outcomes, equity, and long-term societal benefit.
In conclusion, while AI holds promise for transforming healthcare, its deployment must be approached with caution and rigor. Adopting a clinical trials framework ensures that AI solutions are thoroughly tested for safety, efficacy, and effectiveness before widespread implementation. Teams should measure patient outcomes, safety, and equity rather than solely focusing on process improvements or model performance. By sharing lessons learned from early adopters, including academic medical centers, across all healthcare settings, we can ensure that AI solutions are both effective and equitable, benefiting diverse populations and improving the quality of care for all.
Data availability
No datasets were generated or analysed during the current study.
References
Mesko, B. & Görög, M. A short guide for medical professionals in the era of artificial intelligence. npj Digit. Med. 3, 126 (2020).
European Parliament. EU AI Act: first regulation on artificial intelligence. European Parliament https://www.europarl.europa.eu/topics/en/article/20230601STO93804/eu-ai-act-first-regulation-on-artificial-intelligence (2024).
Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024 Laying down Harmonised Rules on Artificial Intelligence and Amending Regulations (EC) No 300/2008, (EU) No 167/2013, (EU) No 168/2013, (EU) 2018/858, (EU) 2018/1139 and (EU) 2019/2144 and Directives 2014/90/EU, (EU) 2016/797 and (EU) 2020/1828 (Artificial Intelligence Act). (2024).
The White House. Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence (2023).
Office of the National Coordinator for Health Information Technology, Department of Health and Human Services. Health Data, Technology, and Interoperability: Certification Program Updates, Algorithm Transparency, and Information Sharing. 45 CFR § 170, 171 (2024).
World Health Organization. Executive Summary: Ethics and Governance of Artificial Intelligence for Health (2021).
Reddy, S., Allan, S., Coghlan, S. & Cooper, P. A governance model for the application of AI in health care. J. Am. Med. Inform. Assoc. 27, 491–497 (2020).
Labkoff, S. et al. Toward a responsible future: recommendations for AI-enabled clinical decision support. J. Am. Med. Inform. Assoc. 31, 2730–2739 (2024).
Li, F., Ruijs, N. & Lu, Y. Ethics & AI: A systematic review on ethical concerns and related strategies for designing with AI in. Healthc. AI 4, 28–53 (2023).
Garba-Sani, Z., Farinacci-Roberts, C., Essien, A. & Yracheta, J. M. A.C.C.E.S.S. AI: a new framework for advancing health equity in health care AI. Health Affairs https://doi.org/10.1377/forefront.20240424.369302 (2024).
Dankwa-Mullan, I. et al. A Proposed framework on integrating health equity and racial justice into the artificial intelligence development lifecycle. J. Health Care Poor Underserved 32, 300–317 (2021).
Clark, C. R. et al. Health care equity in the use of advanced analytics and artificial intelligence technologies in primary care. J. Gen. Intern. Med. 36, 3188–3193 (2021).
Rodriguez, J. A., Alsentzer, E. & Bates, D. W. Leveraging large language models to foster equity in healthcare. J. Am. Med. Inform. Assoc. 31, 2147–2150 (2024).
Ishizaki, K. AI model lifecycle management: overview. IBM https://www.ibm.com/think/topics/ai-lifecycle (2020).
Cruz Rivera, S., Liu, X., Chan, A.-W., Denniston, A. K. & Calvert, M. J. Guidelines for clinical trial protocols for interventions involving artificial intelligence: the SPIRIT-AI extension. Lancet Digit. Health 2, e549–e560 (2020).
Cruz Rivera, S., Liu, X., Chan, A.-W., Denniston, A. K. & Calvert, M. J. Guidelines for clinical trial protocols for interventions involving artificial intelligence: the SPIRIT-AI extension. Nat. Med. 26, 1351–1363 (2020).
Rivera, S. C., Liu, X., Chan, A.-W., Denniston, A. K. & Calvert, M. J. Guidelines for clinical trial protocols for interventions involving artificial intelligence: the SPIRIT-AI extension. BMJ 370, m3210 (2020).
Liu, X., Cruz Rivera, S., Moher, D., Calvert, M. J. & Denniston, A. K. Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI extension. Lancet Digit Health 2, e537–e548 (2020).
Liu, X., Rivera, S. C., Moher, D., Calvert, M. J. & Denniston, A. K. Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI extension. BMJ 370, m3164 (2020).
Liu, X., Cruz Rivera, S., Moher, D., Calvert, M. J. & Denniston, A. K. Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI extension. Nat. Med 26, 1364–1374 (2020).
Esmaeilzadeh, P. Challenges and strategies for wide-scale artificial intelligence (AI) deployment in healthcare practices: a perspective for healthcare organizations. Artif. Intell. Med. 151, 102861 (2024).
Higgins, D. & Madai, V. I. From bit to bedside: a practical framework for artificial intelligence product development in healthcare. Adv. Intell. Syst. 2, 2000052 (2020).
Food and Drug Administration. Step 3: clinical research. FDA https://www.fda.gov/patients/drug-development-process/step-3-clinical-research (2018).
American Medical Informatics Association. AMIA 2024 artificial intelligence evaluation showcase. AMIA https://amia.org/education-events/amia-2024-artificial-intelligence-evaluation-showcase (2024).
Peleg, M. Computer-interpretable clinical guidelines: a methodological review. J. Biomed. Inf. 46, 744–763 (2013).
Fox, J. Cognitive systems at the point of care: The CREDO program. J. Biomed. Inform. 68, 83–95 (2017).
Nestor, B. et al. Preparing a clinical support model for silent mode in general internal medicine. In Proceedings of the 5th Machine Learning for Healthcare Conference Vol. 126 (eds Doshi-Velez, F. et al.) 950–972 (PMLR, 2020).
Unlu, O. et al. Retrieval-augmented generation–enabled GPT-4 for clinical trial screening. NEJM AI 1, AIoa2400181 (2024).
de Hond, A. A. H. et al. Perspectives on validation of clinical predictive algorithms. npj Digit. Med. 6, 86 (2023).
Dadabhoy, F. Z. et al. Prospective external validation of a commercial model predicting the likelihood of inpatient admission from the emergency department. Ann. Emerg. Med. 81, 738–748 (2023).
Bunney, G. et al. Beyond chest pain: Incremental value of other variables to identify patients for an early ECG. Am. J. Emerg. Med. 67, 70–78 (2023).
Callahan, A. et al. Standing on FURM ground: a framework for evaluating fair, useful, and reliable AI models in health care systems. NEJM Catal. 5, CAT.24.0131 (2024).
Röösli, E., Bozkurt, S. & Hernandez-Boussard, T. Peeking into a black box, the fairness and generalizability of a MIMIC-III benchmarking model. Sci. Data 9, 24 (2022).
Agency for Healthcare Research and Quality (US). Criteria for Distinguishing Effectiveness from Efficacy Trials in Systematic Reviews. https://www.ncbi.nlm.nih.gov/books/NBK44029/ (2006).
Achiam, O. J. et al. GPT-4 Technical Report. in. https://doi.org/10.48550/arXiv.2303.08774 (2023).
Garcia, P. et al. Artificial intelligence–generated draft replies to patient inbox messages. JAMA Netw. Open 7, e243201–e243201 (2024).
Small, W. R. et al. Large language model–based responses to patients’ in-basket messages. JAMA Netw. Open 7, e2422399–e2422399 (2024).
Davis, S. E., Greevy, R. A. J., Lasko, T. A., Walsh, C. G. & Matheny, M. E. Detection of calibration drift in clinical prediction models to inform model updating. J. Biomed. Inf. 112, 103611 (2020).
Aaron, S., McEvoy, D. S., Ray, S., Hickman, T.-T. T. & Wright, A. Cranky comments: detecting clinical decision support malfunctions through free-text override reasons. J. Am. Med. Inform. Assoc. 26, 37–43 (2019).
McCoy, A. B. et al. Clinician collaboration to improve clinical decision support: the Clickbusters initiative. J. Am. Med. Inform. Assoc. 29, 1050–1059 (2022).
National Association of Community Health Centers & American Academy of Family Physicians. Closing the Primary Care Gap: How Community Health Centers Can Address the Nation’s Primary Care Crisis. https://www.nachc.org/wp-content/uploads/2023/06/Closing-the-Primary-Care-Gap_Full-Report_2023_digital-final.pdf (2023).
Your Clinical Decision Support Software: Is It a Medical Device? US Food and Drug Administration https://www.fda.gov/medical-devices/software-medical-device-samd/your-clinical-decision-support-software-it-medical-device (2022).
The Device Development Process. US Food and Drug Administration https://www.fda.gov/patients/learn-about-drug-and-device-approvals/device-development-process (2018).
Longhurst, C. A., Singh, K., Chopra, A., Atreja, A. & Brownstein, J. S. A call for artificial intelligence implementation science centers to evaluate clinical effectiveness. NEJM AI 1, AIp2400223 (2024).
Chin, M. H. et al. Guiding principles to address the impact of algorithm bias on racial and ethnic disparities in health and health care. JAMA Netw. Open 6, e2345050–e2345050 (2023).
Meskó, B. & Topol, E. J. The imperative for regulatory oversight of large language models (or generative AI) in healthcare. npj Digit. Med. 6, 120 (2023).
MIT Work of the Future. About Us. MIT Work of the Future https://workofthefuture-taskforce.mit.edu/mission/ (2024).
Coalition for Health AI, Inc. Our Purpose. CHAI https://chai.org/our-purpose/ (2024).
Stanford Medicine. Responsible AI for Safe and Equitable Health. RAISE Health https://med.stanford.edu/raisehealth (2024).
Stanford University. Stanford University Human-Centered Artificial Intelligence. https://hai.stanford.edu/ (2024).
Acknowledgements
J.G.Y. is supported by National Library of Medicine/National Institutes of Health grant [T15LM007092].Research reported in this publication was supported by the National Center for Advancing Translational Sciences of the National Institutes of Health under Award Number UL1TR003142. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Author information
Authors and Affiliations
Contributions
T.H.B., A.L., and R.G.M. conceptualized the Perspective. J.G.Y. wrote the initial draft. T.H.B., M.A.P., A.L., and R.G.M. contributed to the first draft and provided critical revisions. All authors have read, reviewed, and approved of the final manuscript.
Corresponding author
Ethics declarations
Competing interests
A.L. is a consultant for the Abbott Medical Device Cybersecurity Council. The other authors have no competing financial or non-financial interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
You, J.G., Hernandez-Boussard, T., Pfeffer, M.A. et al. Clinical trials informed framework for real world clinical implementation and deployment of artificial intelligence applications. npj Digit. Med. 8, 107 (2025). https://doi.org/10.1038/s41746-025-01506-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41746-025-01506-4
This article is cited by
-
Technoethics in real life: AI as a core clinical competency
Journal of Anesthesia, Analgesia and Critical Care (2025)