Evaluating the Performance and Potential Bias of Predictive Models for Detection of Transthyretin Cardiac Amyloidosis

Jonathan Hourmozdi, Nicholas Easton, Simon Benigeri, James D. Thomas, Akhil Narang, David Ouyang, Grant Duffy, Ross Upton, Will Hawkes, Ashley Akerman, Ike Okwuosa, Adrienne Kline, Abel N. Kho, Yuan Luo, Sanjiv J. Shah, and Faraz S. Ahmad

View the full publication here

Background

Delays in the diagnosis of transthyretin amyloid cardiomyopathy (ATTR-CM) contribute to the significant morbidity of the condition, especially in the era of disease-modifying therapies. Screening for ATTR-CM with artificial intelligence and other algorithms may improve timely diagnosis, but these algorithms have not been directly compared.

Objectives

The aim of this study was to compare the performance of 4 algorithms for ATTR-CM detection in a heart failure population and assess the risk for harms due to model bias.

Methods

We identified patients in an integrated health system from 2010 to 2022 with ATTR-CM and age- and sex-matched them to controls with heart failure to target 5% prevalence. We compared the performance of a claims-based random forest model (Huda et al model), a regression-based score (Mayo ATTR-CM), and 2 deep learning echo models (EchoNet-LVH and EchoGo Amyloidosis). We evaluated for bias using standard fairness metrics.

Results

The analytical cohort included 176 confirmed cases of ATTR-CM and 3,192 control patients with 79.2% self-identified as White and 9.0% as Black. The Huda et al model performed poorly (AUC: 0.49). Both deep learning echo models had a higher AUC when compared to the Mayo ATTR-CM Score (EchoNet-LVH 0.88; EchoGo Amyloidosis 0.92; Mayo ATTR-CM Score 0.79; DeLong P < 0.001 for both). Bias auditing met fairness criteria for equal opportunity among patients who identified as Black.

This table compares the performance metrics of four models (Huda et al, Mayo ATTR-CM Score, EchoNet-LVH, and EchoGo Amyloidosis) for detecting amyloidosis in a dataset of 176 cases and 3,192 controls. Metrics include F1 score, accuracy (Acc), sensitivity (Sens), specificity (Spec), positive predictive value (PPV), negative predictive value (NPV), false negative rate (FNR), average precision (AP), and area under the ROC curve (AUC), with 95% confidence intervals shown in parentheses. EchoGo® Amyloidosis demonstrated the highest sensitivity and accuracy (AUC 0.92).

Conclusions

Deep learning, echo-based models to detect ATTR-CM demonstrated best overall discrimination when compared to 2 other models in external validation with low risk of harms due to racial bias.