Variable selection results
A detailed explanation of all the variables collected in this study can be found in Supplementary Table 1. After screening, we finally included 10 variables that can be divided into three categories: demographic data, cardiac ultrasound, and serological indicators. Demographic data included systolic blood pressure (SBP). Echocardiographic parameters included left atrial diameter (LA) and left ventricular ejection fraction (LVEF). These indicators were analyzed by routine transthoracic echocardiography (TTE) performed by a certified cardiologist at baseline and collected from the EHR. Serological parameters included white blood cell (WBC), neutrophils (NC), hemoglobin (Hb), N-terminal pro-brain natriuretic peptide (NT-proBNP), uric acid (UA), the ratio of low-density lipoprotein cholesterol to high-density lipoprotein cholesterol (LDL-C/HDL-C), and the atherosclerosis index of plasma (AIP). AIP is a logarithmically transformed ratio of TG (Total triglycerides) to HDL-C (High density lipoprotein cholesterol) in molar concentration (mmol/L), and it is mathematically derived from log (TG/HDL-C)33. All serological indicators were obtained from the peripheral blood sample collected for the first time at baseline. The distribution of these 10 variables and their correlation with AF diagnostic subtypes in three independent centers are clearly shown in Fig. 2A1-A10 and B1-B3.
Baseline characteristics of participants
Initially, we collected 11,986 patients from DH, SYSMH, and SSH, including 6436, 4581, and 969 patients (Fig. 1B). According to the inclusion and exclusion criteria, we finally enrolled 4155 patients that including 2248 in DH, 1599 in SYSMH, and 338 in SSH. The number of patients with paroxysmal AF and persistent AF was 2565 (61.29%) and 1620 (38.71%), respectively. The proportions of patients with paroxysmal AF and persistent AF in the three centers were also close to 60% and 40% (1361 and 887 in DH, 1019 and 580 in SYSMH, 187 and 151 in SSH). The baseline characteristics of all variables finally included in the model are shown in Table 1.
Table 1
Baseline characteristics of participant
Variables | Total (n = 4185) | DH (n = 2248) | SYSMH (n = 1599) | SSH (n = 338) | p-Value |
|---|
Age (years) | 68 (59, 77) | 69 (59, 78) | 67 (60, 75) | 69 (58, 78) | < 0.001* |
Gender (n, %) | | | | | 0.797 |
Male (%) | 2468 (58.97) | 1320 (58.71) | 943 (58.97) | 205 (60.65) | |
Female (%) | 1717 (41.03) | 928 (41.29) | 656 (41.03) | 133 (39.35) | |
SBP (mmHg) | 131 (117, 147) | 135 (118, 150) | 127 (116, 142) | 132 (118, 147) | < 0.001* |
WBC (109/L) | 7.08 (5.73, 9.09) | 7.43 (5.84, 9.74) | 6.72 (5.67, 8.24) | 7.17 (5.51, 9.19) | < 0.001* |
NC (109/L) | 4.53 (3.36, 6.45) | 4.86 (3.35, 7.23) | 4.21 (3.35, 5.50) | 5.07 (3.17, 7.34) | < 0.001* |
Hb (g/L) | 132 (118, 146) | 131 (115, 145) | 135 (123, 147) | 130 (117, 145) | < 0.001* |
AIP | 0.032 (-0.156, 0.226) | 0.014 (-0.180, 0.217) | 0.055 (-0.128, 0.241) | 0.009 (-0.173, 0.213) | < 0.001* |
LDL-C/HDL-C | 2.353 (1.733, 3.099) | 2.256 (1.637, 3.045) | 2.521 (1.944, 3.203) | 2.055 (1.464, 2.792) | < 0.001* |
NT-proBNP (pg/ml) | 792 (209, 2337) | 1144 (312, 3053) | 470 (128, 1190) | 1223 (324, 3469) | < 0.001* |
UA (µmol/L) | 384 (309, 468) | 381 (301, 473) | 391 (320, 468) | 364 (289, 446) | < 0.001* |
LA (mm) | 37 (32, 43) | 36 (31, 42) | 38 (35, 43) | 40 (35, 45) | < 0.001* |
LVEF (%) | 64 (57, 68) | 62 (55, 67) | 66 (61, 70) | 59 (50, 65) | < 0.001* |
Diagnose (n, %) | | | | | 0.008* |
0 | 2565 (61.29) | 1361 (60.54) | 1019 (63.73) | 187 (55.33) | |
1 | 1620 (38.71) | 887 (39.46) | 580 (36.27) | 151 (44.67) | |
Values are presented as n (%) as appropriate or the median [interquartile range (IQR)]. DH, Donghua Hospital of Sun Yat-sen University; SYSMH, Sun Yat-sen Memorial Hospital of Sun Yat-sen University; SSH, Dongguan Songshan Lake Donghua Hospital; SBP, systolic blood pressure; WBC, white blood cell; NC, Neutrophil count; Hb, hemoglobin; AIP, Atherogenic index of plasma; LDL-C, low density lipoprotein cholesterol; HDL-C, high density lipoprotein cholesterol; NT-proBNP, N-terminal brain natriuretic peptide precursor; UA, uric acid; LA, left atrial diameter; LVEF, left ventricular ejection fraction; Diagnose, 0 = paroxysmal AF, 1 = persistent AF. |
Results of AF subtype prediction model
As described in the Methods section, we used five machine learning methods to build the model. We used the DH dataset with the largest sample size as the primary dataset and performed a five-fold cross-validation. As shown in Fig. 1C, in each cross-validation, the DH dataset was split into a training set and internal validation set in the ratio of 8:2, accompanied by independent external validation of the SYSMH and SSH datasets. This strategy helps eliminate the influence of data partitioning, ensuring more convincing and reliable results. The output variable of the model was the predicted AF diagnostic subtype, which was compared with the diagnosis recorded in the EHR at discharge. Comparing the evaluation indicators, we found that the model established in our study had good predictive performance and stable generalizability (as shown in Table 2).
Table 2
Results of model output indicators in multicenter
Model | ACC | Precision | Recall | AUC | F1 Score | SEN | SPE |
|---|
DH |
|---|
CatBoost | 0.789 (0.766–0.825) | 0.755 (0.700-0.812) | 0.694 (0.656–0.727) | 0.861 (0.835–0.898) | 0.722 (0.701–0.764) | 0.694 (0.656–0.727) | 0.852 (0.798–0.892) |
GradientBoost | 0.791 (0.770–0.823) | 0.753 (0.727–0.803) | 0.702 (0.648–0.736) | 0.859 (0.838–0.894) | 0.726 (0.698–0.765) | 0.702 (0.648–0.736) | 0.849 (0.821–0.885) |
LightGBM | 0.780 (0.766–0.815) | 0.738 (0.702–0.793) | 0.688 (0.645–0.721) | 0.855 (0.835–0.889) | 0.711 (0.689–0.754) | 0.688 (0.645–0.721) | 0.840 (0.809–0.879) |
XGBoost | 0.774 (0.756–0.896) | 0.727 (0.679–0.754) | 0.688 (0.644–0.720) | 0.846 (0.827–0.875) | 0.706 (0.683–0.733) | 0.688 (0.644–0.720) | 0.831 (0.797–0.852) |
AdaBoost | 0.783 (0.768–0.803) | 0.740 (0.716–0.769) | 0.697 (0.661–0.726) | 0.845 (0.826–0.875) | 0.717 (0.695–0.739) | 0.697 (0.661–0.726) | 0.840 (0.811–0.863) |
SSH |
CatBoost | 0.748 (0.735–0.762) | 0.681 (0.669–0.690) | 0.834 (0.817–0.867) | 0.833 (0.828–0.840) | 0.750 (0.736–0.767) | 0.834 (0.817–0.867) | 0.677 (0.665–0.696) |
GradientBoost | 0.734 (0.725–0.744) | 0.670 (0.663–0.677) | 0.813 (0.792–0.837) | 0.825 (0.820–0.831) | 0.734 (0.725–0.748) | 0.813 (0.792–0.837) | 0.668 (0.650–0.685) |
LightGBM | 0.737 (0.731–0.755) | 0.674 (0.662–0.690) | 0.812 (0.792–0.836) | 0.821 (0.814–0.830) | 0.737 (0.728–0.755) | 0.812 (0.792–0.836) | 0.676 (0.651–0.691) |
XGBoost | 0.728 (0.708–0.746) | 0.666 (0.649–0.681) | 0.804 (0.773–0.829) | 0.821 (0.806–0.830) | 0.728 (0.706–0.747) | 0.804 (0.773–0.829) | 0.666 (0.644–0.681) |
AdaBoost | 0.741 (0.725–0.764) | 0.677 (0.657–0.699) | 0.817 (0.786–0.842) | 0.819 (0.808–0.830) | 0.740 (0.722–0.764) | 0.817 (0.786–0.842) | 0.678 (0.642–0.701) |
SYSMH |
CatBoost | 0.808 (0.803–0.816) | 0.707 (0.698–0.719) | 0.802 (0.788–0.812) | 0.876 (0.871–0.880) | 0.752 (0.747–0.761) | 0.802 (0.788–0.812) | 0.811 (0.802–0.820) |
GradientBoost | 0.802 (0.794–0.810) | 0.702 (0.688–0.708) | 0.790 (0.769–0.809) | 0.872 (0.869–0.873) | 0.743 (0.733–0.755) | 0.790 (0.769–0.809) | 0.809 (0.795–0.814) |
LightGBM | 0.807 (0.801–0.811) | 0.706 (0.695–0.721) | 0.801 (0.779–0.820) | 0.875 (0.873–0.877) | 0.750 (0.744–0.755) | 0.801 (0.779–0.820) | 0.810 (0.797–0.828) |
XGBoost | 0.800 (0.795–0.807) | 0.697 (0.685–0.704) | 0.793 (0.772–0.810) | 0.873 (0.869–0.879) | 0.742 (0.734–0.751) | 0.793 (0.772–0.810) | 0.804 (0.789–0.811) |
AdaBoost | 0.787 (0.775–0.795) | 0.678 (0.666–0.686) | 0.787 (0.762–0.803) | 0.860 (0.853–0.866) | 0.728 (0.711–0.740) | 0.787 (0.762–0.803) | 0.787 (0.782–0.791) |
DH, Donghua Hospital of Sun Yat-sen University; SYSMH, Sun Yat-sen Memorial Hospital of Sun Yat-sen University; SSH, Dongguan Songshan Lake Donghua Hospital; ACC, accuracy; AUC, area under curve; CI, confidence interval; SEN, sensitivity; SPE, specificity. |
Among the overall models built based on the DH dataset, the GradientBoost model had the highest accuracy of 0.791 (95%CI: 0.770–0.823), followed by the CatBoost model, which was 0.789 (95%CI: 0.766–0.825). In terms of AUC value comparison, the value of the GradientBoost model was 0.859 (95% CI: 0.838–0.894), which was slightly lower than the maximum value of 0.861 (95% CI: 0.835–0.898) of the CatBoost model. In terms of sensitivity and specificity, the GradientBoost model (0.702, 95%CI: 0.648–0.736) and the CatBoost model (0.852, 95%CI: 0.798–0.892) performed best, respectively. As for the two independent validation sets SSH and SYSMH, except that the highest specificity of SSH was achieved by the AdaBoost model (0.678, 95%CI: 0.642–0.701), the CatBoost model had the highest accuracy of 0.808 (95%CI: 0.803–0.816), AUC value of 0.876 (95%CI: 0.871–0.880), sensitivity of 0.802 (95%CI: 0.788–0.812) and specificity of 0.811 (95%CI: 0.802–0.820). We summarized the AUC for every center and the results of each fold of the five-fold cross validation for the five algorithms, as shown in Fig. 2C1-C3. We also plotted the AUC results of each fold in the five-fold cross-validation process for all centers in Supplementary Fig. 1.
Interpretation of AF subtype prediction model
As shown in Fig. 3A, the impact of different variables on the model output is illustrated by ranking their absolute SHAP values in descending order. The five variables that most significantly affect the diagnosis of different AF types are LA, NT-proBNP, Hb, LVEF, and UA. Figure 3B provides a more intuitive view of the relationship between these variables and AF diagnostic types. It can be observed that LA, LVEF, and UA affect the diagnosis of AF in a certain pattern. For example, LA displays a gradient transitioning from blue to red. There is a distinct color boundary near a SHAP value of 0, indicating a regular pattern between LA values and AF diagnostic types. Specifically, when LA values are lower, the model tends to predict paroxysmal AF, whereas higher LA values are associated with a diagnosis of persistent AF. Figure 3C shows the impact of all variables on sample classification across all samples, with red indicating a positive effect (persistent AF) on the model’s prediction and blue indicating a negative effect (persistent AF). Figure 3D further illustrates the influence of each variable on the model’s prediction for a specific sample using the SHAP method. Figure 3E shows how the top five variables most influential for distinguishing AF subtypes affect the model’s output as each variable changes. Compared to Fig. 2B, this view makes the impact of each variable’s trend even clearer. For example, as LA increases, its contribution shifts the model’s prediction toward persistent AF.
In the model for early differentiation of AF subtypes as paroxysmal or persistent, the top five variables were LA, NT-proBNP, LVEF, UA, and SBP. For these variables, we further obtained the mutual influence relationship between them and plotted them into a scatter plot (see Supplementary Fig. 2 for details). As shown in Supplementary Fig. 3, we also explored the relationship between these five variables and AF diagnostic subtypes in three independent centers by restricted cubic spline analysis.
Performance and explanation of AF subtype prediction models in subgroups
We divide the participants of every center into six groups: male under 60 years old, female under 60 years old, male 60–65 years old, female 60–65 years old, male 65 years old and above, female 65 years old and above. The model has achieved good prediction performance among different subgroups. We sorted the results of AUC between different centers according to male or female in different age subgroups to draw Supplementary Fig. 4. The order of A-F in Supplementary Fig. 4 is arranged according to age, and centers 1–3 are DH, SSH, and SYSMH, respectively. We show these results and the specific values of other evaluation indicators in Supplementary Tables 2–4. Similarly, we used the SHAP method to visualize the interpretability of the model. The SHAP graphs of all subgroups are summarized in Supplementary Fig. 5 of the Supplementary Materials. The most important and second influencing factors of any age subgroups are LA and NT-proBNP, which are similar to the overall model.