External Validation of Artificial Intelligence for Detection of Heart Failure with Preserved Ejection Fraction

Ashley P. Akerman, Nora Al-Roub, Constance Angell-James, Madeline A. Cassidy, Rasheed Thompson, Lorenzo Bosque, Katharine Rainer, William Hawkes, Hania Piotrowska, Paul Leeson, Gary Woodward, Patricia A. Pellikka, Ross Upton & Jordan B. Strom

View the full publication here

Background

Artificial intelligence (AI) models to identify heart failure (HF) with preserved ejection fraction (HFpEF) based on deep-learning of echocardiograms could help address under-recognition in clinical practice, but they require extensive validation, particularly in representative and complex clinical cohorts for which they could provide most value.

Methods

In this study enrolling patients with HFpEF (cases; n = 240), and age, sex, and year of echocardiogram matched controls (n = 256), we compare the diagnostic performance (discrimination, calibration, classification, and clinical utility) and prognostic associations (mortality and HF hospitalization) between an updated AI HFpEF model (EchoGo Heart Failure v2) and existing clinical scores (H2FPEF and HFA-PEFF).

Results

The AI HFpEF model and H2FPEF score demonstrate similar discrimination and calibration, but classification is higher with AI than H2FPEF and HFA-PEFF, attributable to fewer intermediate scores, due to discordant multivariable inputs. The continuous AI HFpEF model output adds information beyond the H2FPEF, and integration with existing scores increases correct management decisions. Those with a diagnostic positive result from AI have a two-fold increased risk of the composite outcome.

Fig. 2

Figure 1. Shown are the receiver operating characteristic (ROC) curves comparing discrimination for identification of HFpEF using AI HFpEF model vs. the H2FPEF score. AI HFpEF is in blue (area under the curve of the ROC [AUROC]: 0.798, [95% CI 0.756–0.799]), and the H2FPEF score is in orange (0.788, ([0.745–0.789])). The difference between the two was not significant (mean difference in AUROC, 0.01, [–0.043–0.064], p = 0.710 using a two-sided DeLong test).

Conclusion

We conclude that integrating an AI HFpEF model into the existing clinical diagnostic pathway would improve identification of HFpEF in complex clinical cohorts, and patients at risk of adverse outcomes.