Advances in AI echocardiography are enabling earlier, more reliable detection of HFpEF and cardiac amyloidosis—backed by rigorous validation in real-world and external datasets.
In her ASE 2025 presentation, Dr. Patricia A.Pellikka (Mayo Clinic) walked through the validation journey behind AI echo models designed to detect HFpEF and cardiac amyloidosis. This article recapsher session, highlighting model performance, comparisons with traditional clinical scores, and why implementation-ready AI depends on robust external validation and real-world testing.
Let me take you back to 2019. AI in echocardiography was still quite new at that time, and there wasn’t much activity in the field.I wanted to build a model that would be truly impactful. The idea was to develop a model to detect HFpEF, because assessment of diastolic dysfunction isa challenge in echocardiography, and HFpEF is a major public health problem that remains difficult to diagnose.
I received a grant from the American Society ofEchocardiography Foundation for this work and identified Ultromics as a partner. This has been a great partnership.
If you think about it, there is a tremendous amount of information within the apical four-chamber view. We wondered whether HFpEF could be detected using only this limited but information-rich view. You can appreciate left ventricular function, left atrial size, motion of the mitraland tricuspid annulus, wall thickness, and other features that are not always fully evaluated in routine practice.
We worked on developing this model using a control population of nearly 4,000 patients without heart failure—defined as no heart failure diagnosis for at least one year after the echocardiogram, normal ejection fraction, and normal or grade 1 diastolic dysfunction.
The HFpEF population consisted of nearly 3,000 patients with a clinical diagnosis of heart failure with preserved ejection fraction and elevated filling pressures. Seventeen percent of the data were withheld for validation.
Data were obtained from multiple ultrasound systems. We analyzed sequences of 30 frames, with each frame resized to 256 × 256 pixels.Data augmentation was applied, including image flipping and gain variation.
The model output classified studies as suggestive ofHFpEF, not suggestive of HFpEF, or uncertain.
This AI echo model demonstrated excellent discrimination and classification. Performance metrics were evaluated for training, validation, and testing. External testing was performed at a different Mayo Clinic site using data that had not been used for training.
Controls and HFpEF patients were matched for sex and year of echocardiogram, with age matching as closely as possible. These echocardiograms were acquired by different sonographers and interpreted by different physicians.
The model produced an uncertainty rate of approximately 7%. The AUC in external testing was 0.91, with a sensitivity of 88% and specificity of 82%. Positive and negative predictive values were also favorable.
We wanted to ensure the model was assessing something clinically meaningful. We examined age-adjusted mortality risk in relation toAI model output. Survival was worse in patients classified as HFpEF compared with those without HFpEF, with intermediate outcomes observed in patients with uncertain classification.
We also evaluated how the model performed compared with standard clinical models for detecting HFpEF. The two most widely used models are the HFA-PEFF score and the H₂FPEF score. A major limitation of both is the high number of indeterminate results they produce.
Our AI echo model was able to reclassify approximately75% of these indeterminate cases, yielding a definitive diagnostic result.
In one example, a 53-year-old woman with a murmur had a very low AI score, confidently indicating no HFpEF. In contrast, a 56-year-old woman with dyspnea and acute on chronic HFpEF had a high score with strong confidence in interpretation.
We also evaluated whether the model could identify patients at risk for heart failure hospitalization and cardiac mortality. In a separate study using a second version of the model, outcomes were adjudicated independently using the National Death Index.
Patients were stratified by AI output score quartile. Heart failure hospitalization and cardiac mortality were both higher inpatients with positive AI classifications. The AI model out performed traditional clinical scores for both outcomes.
External validation is critically important for AI echo models. This work was further validated by collaborators at Beth IsraelDeaconess Medical Center. AI echo classification outperformed clinical scores, and while discrimination was similar to the H₂FPEF score, the combination of AI echo and clinical scoring provided the best approach for identifying patients appropriate for treatment.
We also evaluated performance across different ultrasound systems. Using a point-of-care handheld device and comparing results with full transthoracic echocardiograms from a different vendor, AI prediction scores were consistent. Patients who were positive on both systems had higher clinical scores, while those negative on both had lower scores.
Our next challenge was developing an AI echo model to detect cardiac amyloidosis. This is a major clinical challenge, and early diagnosis is critical now that effective treatments are available.
We examined the prevalence of cardiac amyloidosis inpatients undergoing echocardiography at Mayo Clinic Rochester over one year. Approximately 1.25% had cardiac amyloidosis, with prevalence increasing with age. Importantly, about 0.5% of patients had echocardiographic features suggestive of cardiac amyloidosis, but the diagnosis was not pursued. This represents a significant opportunity for AI-based detection.
The AI model developed with Ultromics was compared with multiple existing models, including claims-based models, logistic regression clinical models, and other deep learning echo approaches. EchoGo® Amyloidosis demonstrated strong performance with a high AUC.
These deep learning models performed better overall, and importantly, evaluation showed a low risk of harm from racial bias.
We are extremely excited about the work we have done developing this model for detection of cardiac amyloidosis. I will now turn this over to Dr. Jeremy Slivnick, who will discuss the study recently published in the European Heart Journal. Thank you for your attention.