FDA's Digital Health Advisory Committee Discusses Total Product Lifecycle Considerations for GenAI-Enabled Devices
In Short
The Situation: While generative artificial intelligence ("GenAI") has the potential to fundamentally change health care, it presents unique risks and complexities that challenge the U.S. Food and Drug Administration's ("FDA" or "Agency") historic approach to regulating medical devices.
The Result: FDA held its first meeting of the Digital Health Advisory Committee ("DHAC"), which offered guidance and recommendations for how FDA should review and regulate GenAI-enabled devices.
Looking Ahead: Given the unique characteristics of GenAI, the DHAC meeting highlights the potential need for additional regulatory controls and approaches to ensure GenAI-enabled devices are safe and effective throughout the total product lifecycle.
On November 20–21, 2024, FDA held the first meeting of the DHAC to obtain feedback and recommendations on how it should review and regulate GenAI-enabled devices through the total product lifecycle approach, which the Agency has long promoted in the oversight of devices, including non-GenAI-enabled devices. During opening remarks, FDA Commissioner Dr. Robert Califf stated that while the Agency has yet to authorize any GenAI-enabled devices, it must embrace new technologies like GenAI "not only to keep pace with the industries we regulate, but also to use regulatory channels and oversight to improve the chance that they will be applied effectively, consistently, and fairly."
GenAI has broad health care applications, including the potential to streamline clinical workflows, improve diagnostic accuracy, assist in medical training, and improve health care efficiency and accessibility, particularly in underserved communities. While it has tremendous upside, GenAI also introduces new considerations related to the scope of the intended use, oversight and transparency, adequacy and diversity of the data, and how to evaluate and monitor real-world performance (e.g., bias and hallucinations).
As it does to all medical devices, FDA's regulatory oversight applies to GenAI-enabled products that meet the definition of "device" and considers a risk-based approach to the product's intended use and technological characteristics. As noted in FDA's executive summary shared before the meeting, the use and adoption of GenAI presents risks and complexities that challenge FDA's historic approach to device regulation. Specifically, FDA faces challenges applying a risk-based approach to classification and determining regulatory requirements for GenAI-enabled devices.
For example, given the evolving nature of GenAI-enabled products with the potential for performance bias or hallucinations (often hard to identify or explain) introducing uncertainty and risk, it can be difficult to determine how the product's intended use, functionality, and level of risk align with FDA's current digital health policies, e.g., Policy for Device Software Functions and Mobile Medical Applications. Generally, these policies are scoped to certain, specific intended uses and low-risk products.
Further, GenAI's technological characteristics may introduce new or different risks, which raises new questions of safety and effectiveness that may in turn impact the device's classification or premarket pathway, as well as the kinds of regulatory controls that may be necessary to ensure safety and effectiveness. The Agency also faces challenges determining the types of valid scientific evidence necessary for FDA's evaluation of the safety and effectiveness of GenAI-enabled devices over the total product lifecycle. For example, since GenAI models can have open-ended inputs, FDA cannot evaluate in a premarket approach every input the model might encounter, which is especially difficult to do when the sponsor lacks control of or visibility into the datasets or parameters of the underlying foundation model.
With these challenges in mind, FDA asked DHAC in its discussion questions to consider what information and practices are necessary for a comprehensive approach to risk management throughout the total product lifecycle for GenAI-enabled devices, namely what information the Agency needs to evaluate pre- and post-market to support safety and effectiveness. In the two-day meeting, DHAC members proposed a framework based on three key areas: (i) premarket performance evaluation; (ii) risk management; and (iii) post-market performance monitoring.
Premarket Performance Evaluation
The majority of the discussion on the first day of the meeting was focused on premarket performance evaluation, and DHAC made numerous recommendations to FDA on what information it should require from sponsors when evaluating the safety and effectiveness of GenAI-enabled devices.
DHAC agreed with FDA that, compared to non-GenAI-enabled devices, premarket submissions for GenAI-enabled devices will require more information and a higher level of detail regarding the underlying device design, performance-testing requirements, GenAI model implementation, and underlying foundation model, similar to FDA's current approach to premarket submissions for devices that incorporate off-the-shelf software.
Specifically, members said important information for a premarket submission includes the intended use case, the intended population, and a description of the design specifications, data management, characteristics, and development for the initial foundation model, as well as the specific details on data management and fine-tuning for the underlying GenAI model specific to the GenAI-enabled device. This would include the population on which it was trained, how it was tested, and whether it is dynamic or autonomous, including the level of autonomy/control to the end user and whether there are humans in, on, or out of the loop. It would require information on whether it creates bounded or unbounded outputs; the setting where it is used; estimates of hallucinations, error rates, and uncertainty around the data underlying the model (e.g., stress testing results); and cybersecurity and data privacy standards.
Relatedly, DHAC agreed that FDA should develop a standard form that sponsors can use to disclose the data used to develop, test, and validate the GenAI algorithm, and observed that the premarket submission should depict the boundaries of the data types the algorithm was trained on to control for erroneous outputs, data drift, and hallucinations. Additional important considerations include how different inputs, populations, settings, and temperature parameters impact performance or outputs.
DHAC also recommended that FDA require sponsors to submit a post-market performance monitoring plan—similar to special controls for class II devices—as well as the results of performance assessments, such as benchmarking with standardized reference datasets to directly compare the GenAI to existing non-GenAI models with known performance parameters. Notably, certain members questioned the usefulness of benchmarking given that GenAI models continuously change over time. Instead, other performance assessment strategies, such as expert evaluation and model-based evaluation, were suggested to be better suited. Members questioned whether new evaluation methodologies and performance metrics should be developed.
Risk Management
DHAC uniformly agreed that new controls are needed to mitigate risks associated with GenAI-enabled devices, such as governance strategies, training, feedback mechanisms, and real-world performance evaluations. For example, DHAC generally believed that the output of GenAI-enabled devices should be categorized based on risk.
Further, human-in-the-loop or expert-in-the-loop feedback is essential for transparency and demonstrates confidence levels to inform users of the reliability of outputs. DHAC considered whether FDA should recommend or require that providers and patients receive training before using GenAI-enabled devices, particularly due to the risks of automation bias and human over-reliance on GenAI outputs.
In turn, some members argued for required labeling studies and specific labeling requirements to disclose to users that the outputs of GenAI-enabled devices are produced by AI. Transparent watermarking could also help to discern between GenAI- and human-generated outputs, facilitating precise tracking and auditing for relevancy and accuracy.
DHAC recommended FDA create a centralized database for users to report the existence and type of errors, such as product and implementation failures. Metrics for adverse event reporting, such as accuracy, safety, and bounds of intended use, should also be established to ensure that the GenAI-enabled devices function within their intended scope and that deviations are quickly addressed.
DHAC argued that, similar to principles embodied by value-based care, accountability should be shared among manufacturers, deployers (e.g., health systems), and providers in order to incentivize risk mitigation. However, others did not believe accountability falls under FDA's regulatory purview since it may, for example, result in the regulation of the practice of medicine. Likewise, a proposal to ban off-label use for GenAI was viewed as FDA overstepping its bounds into the practice of medicine.
Post-Market Performance Monitoring
Lastly, DHAC stressed that continuous post-market performance monitoring, oversight, and evaluation is vitally important for GenAI-enabled devices, recommending that FDA develop a uniform standard and infrastructure for it, which may include tracking accuracy or data drift, detecting bias, and addressing variations in data between multiple sites. Post-market testing and clinical validation must be performed on a frequent basis to ensure that the GenAI model performs consistently with the premarket version, despite receiving additional data and adapting to different sites or conditions once on the market.
DHAC discussed with FDA how predetermined change control plans ("PCCPs") could effectively and efficiently monitor and manage changes for GenAI-enabled devices. However for PCCPs to be utilized for GenAI-enabled devices, FDA stated that it would likely consider how specific the modifications can be for the GenAI algorithm, what boundaries or guardrails are established in the PCCP to define the range of automatic updates, how post-market performance will be monitored over time to ensure the maintenance or improvement of performance, how labeling will be updated when modifications are automatically implemented to notify users, and appropriate notification requirements to FDA and users if the device does not function as intended pursuant to the PCCP.
Industry Implications
DHAC's recommendations to FDA are the latest in a string of policies, proposals, and regulations by various federal and state regulators that have implications for both the manufacturers of GenAI-enabled devices and the providers that utilize them to care for patients. The patchwork and rapidly evolving nature of this oversight is likely to make it challenging for the industry to keep up.
For example, recent state policy efforts emphasizing human accountability and responsibility for the use of GenAI-enabled devices conflict with DHAC's suggestion that accountability should be shared among manufacturers, deployers, and providers. While DHAC acknowledges that human oversight is key to mitigating risk, initial state medical board policy materials emphasize that providers are ultimately responsible for their use of such devices and therefore should be held accountable for harms that occur.
Similarly, FDA-required training for providers and/or patients utilizing GenAI-enabled devices (as considered by DHAC during the meeting and likely directed at the role of manufacturers in such training) could lead to inconsistencies with early and evolving state legislation and state-led efforts to regulate providers' use of AI.
Conclusion
While it's not clear what ideas or recommendations from the DHAC meeting FDA will ultimately adopt, it is clear the advisory committee believes new or additional regulatory controls may be necessary to ensure that GenAI-enabled devices remain safe and effective throughout the total product lifecycle to ultimately protect patients and benefit the public health.
FDA will consider additional public feedback made in comments to the public docket (FDA-2024-N-3924) before January 21, 2025.
Three Key Takeaways
- The DHAC meeting stresses the importance of how the total product lifecycle approach remains important to the management of future, safe, and effective GenAI-enabled devices.
- New regulatory approaches for premarket performance evaluation, risk management, and post-market performance monitoring may be necessary for GenAI-enabled devices.
- Appropriate regulation of GenAI is vital to achieving its potential to fundamentally transform the health care industry because it will promote the responsible development and deployment of this technology in health care settings, ensuring it is safe, clinically useful, and aligned with patient safety and improvement in clinical outcomes.