The MDR Academy platform and domain are for sale. Details

Clinical Evaluation for Software as a Medical Device (SaMD) Under EU MDR

EN|CS
How to build a clinical evaluation for SaMD under MDR — what counts as clinical evidence for software, how MDCG 2020-1 shapes the methodology, and where software teams get into trouble.

Clinical Evaluation for Software as a Medical Device (SaMD) Under EU MDR

Clinical evaluation for software devices follows the same Article 61 and Annex XIV framework as any other medical device — but in practice, software teams run into different problems. The challenge isn't the framework itself; it's that software clinical evidence looks different from hardware clinical evidence, and teams that apply hardware CER conventions to software often produce documentation that doesn't hold up under scrutiny.

What counts as clinical evidence for software

For most hardware devices, the core clinical evidence comes from clinical investigations or literature on the device technology. For software, particularly diagnostic or decision-support software, the evidence landscape is different:

Algorithm validation studies — for AI-based and diagnostic software, the primary clinical evidence is often a prospective or retrospective validation study demonstrating that the algorithm performs as claimed on a representative patient population. This is not the same as technical testing. A validation study must use clinically meaningful endpoints — sensitivity, specificity, NPV, PPV, AUC — not just technical accuracy metrics.

Clinical investigation data — higher-risk SaMD (particularly Class IIb and III) may need prospective clinical investigations under Annex XV, demonstrating real-world clinical impact, not just algorithmic performance on test sets.

Published literature — for established software technologies (e.g., well-validated image analysis algorithms), published literature may be available. The challenge is ensuring the literature covers your specific algorithm version, training data, intended use, and target population. Generic literature on "AI in radiology" does not substitute for evidence on your specific product.

MDCG 2020-1 and what it changes

MDCG 2020-1 (Clinical Evaluation of Medical Device Software) is the guidance document that defines what a rigorous clinical evaluation looks like for software. The key things it establishes:

The clinical evaluation must cover the software's clinical performance — does it produce the intended clinical outputs accurately and reliably for the intended population? — and its clinical association — is there evidence that the clinical outputs are meaningfully linked to a clinical benefit or outcome? For diagnostic software, clinical association means demonstrating that the information it provides is clinically valid and actionable, not just statistically correlated.

MDCG 2020-1 also addresses the distinction between analytical validation (does the software measure what it claims to measure?) and clinical validation (does using the software produce clinical benefit?). Both are required. Teams that produce only technical validation — showing the algorithm performs well on a holdout test set — without demonstrating clinical benefit have an incomplete evaluation.

The version control problem

Software is updated more frequently than hardware, and each update potentially changes the clinical evidence base. One thing that catches SaMD teams off guard is that significant software changes can trigger a requirement to update the clinical evaluation — and in some cases, to generate new clinical evidence.

MDCG 2020-1 and MDCG 2020-3 (the predecessor on software qualification) both discuss what constitutes a significant change for software. A change that improves algorithm performance is significant. A change that alters the intended population is significant. A UI change that affects how clinicians interact with outputs may be significant. If your team treats software clinical evaluation as a one-time exercise rather than a document that evolves with your software version, this will surface at your next Notified Body audit.

The practical implication: your clinical evaluation plan should define upfront what types of changes trigger a CER update. Build this into your software change management process. Without it, your clinical team will always be catching up to your development team.

Where the benefit-risk analysis gets complicated

For SaMD, the benefit-risk analysis must engage with the specific clinical context of use. This is where many software CERs are thin. A benefit-risk analysis that says "the software assists clinicians in diagnosing X condition, which improves patient outcomes" without supporting data on the magnitude and frequency of that benefit — and without addressing the risk of false positives or false negatives in the clinical workflow — will not satisfy the MDR requirement.

For diagnostic software, the most important risks to address are algorithm errors in the patient population (false positives leading to unnecessary treatment, false negatives leading to missed diagnosis) and the impact of those errors relative to the clinical context. A false negative from a triage tool is very different in consequence from a false negative from a diagnosis confirmation tool, even if the algorithm accuracy is the same.

What a solid SaMD CER looks like

A well-structured CER for SaMD addresses: the intended purpose and clinical context precisely (who is the user, what decision does the output support, where in the clinical pathway does it sit); algorithm validation data with clinically relevant endpoints on a representative population; clinical benefit evidence that links the software output to patient-level outcomes; and a benefit-risk analysis that specifically addresses algorithm error modes in the clinical context.

If your SaMD uses AI/ML, the CER should also address how the algorithm's performance was validated across demographic subgroups. Notified Bodies are increasingly asking about this, particularly for diagnostic software where performance disparities across gender, age, or ethnicity are a known issue in the field.

AI Participation & Regulatory Notice

The content on this page may be partially assisted by Artificial Intelligence (AI) to improve readability and ensure clarity.

While our team audits this content, please be aware:

  • Accuracy: AI-assisted interpretations may contain nuances that differ from official MDCG guidance.
  • Timeliness: Medical Device Regulations (MDR) are subject to updates. Always verify critical information against the official EUR-Lex database.
  • Liability: MDR Academy provides these resources for educational purposes only. They do not constitute legal advice.