The final cohort included 66 patients (mean age = 66 ± 11.4 years, 39% female) of whom 44% had an ECAS total score below 108 (Table 1). The ECAS total score was not correlated to age or disease duration. The majority of patients had a spinal onset (68%), and nine individuals (14%) carried a C9orf72 repeat expansion.
Table 1
Cohort. Characteristics of the patients included in the study. N, number; SD, standard deviation; C9orf72, C9 open reading frame 72; ECAS, Edinburgh cognitive and behavioural ALS screen; IQR, inter quartile range; ALSFRS-R, ALS functional rating scale–revised.
Variable | N = 66 |
|---|
Age at sampling, mean (SD) | 66 (11.4) |
Sex, N females (%) | 26 (39) |
Site of onset, N (%) | |
Spinal | 45 (68) |
Bulbar | 19 (29) |
Othera | 2 (3) |
C9orf72 repeat expansion, N (%) | 9 (14) |
ECAS, median (IQR) | 109 (21.5) |
ECAS < 108, N (%) | 29 (44) |
ALSFRS-R, median (IQR) | 41.5 (8.25) |
| a respiratory |
Neurofilament levels do not correlate to known markers of cognitive function
Pairwise correlation analysis of the 47 CSF proteins revealed two main protein clusters, one with strongly correlating proteins (n = 26, median ρ 0.81, IQR 0.14) including proteins commonly regarded as markers for dementia and cognitive function such as neurogranin (NRGN), beta-synuclein (SNCB) and neuromodulin (GAP43) (Fig. 1). The other protein clusters (n = 18 and n = 3) showed generally lower co-variation between proteins (median ρ 0.54, IQR 0.23, and median ρ 0.34, IQR 0.14, respectively). Neurofilament medium (NEFM), being in the small cluster of three proteins, did not exhibit strong correlations with the neuronal proteins in the first cluster (median ρ 0.20, IQR 0.18). Furthermore, chitinase 1 (CHIT1), included in the cluster together with NEFM, displayed a unique correlation profile (median ρ -0.01, range − 0.08 − 0.07, IQR 0.04).
CSF proteins are associated with ECAS total score
We applied elastic net as a variable selection method to identify proteins with a relevant association to ECAS total score. In our final elastic net model, the optimal hyperparameters were found to be α = 0.54 and λ = 3.63. This indicated that the final model used a nearly balanced mix of LASSO and ridge regression penalties, reflecting both the need for variable selection and stability in the presence of potential multicollinearity. The model identified 7 proteins as the most predictive of ECAS total score: NEFM, neuronal pentraxin 2 (NPTX2), GAP43, insulin like growth factor binding protein 4 (IGFBP4), insulin like growth factor binding protein 7 (IGFBP7), osteopontin (SPP1) and cadherin 8 (CDH8) (Table 2). The model produced an RMSE of 13.76 on the training set and 12.03 on the test set. The slightly lower RMSE on the test set suggests that the model not only fit the training data well but also generalized effectively to unseen data, with no indications of overfitting.
Table 2
Candidate protein and protein pairs. Results from the elastic net and linear regression analyses.
| | Linear regression | Selected in elastic net of single proteins | Selected in elastic net of protein pairs |
|---|
| | β coefficient | CV R2 | | |
|---|
Single protein |
|---|
IGFBP7 | 4.05 (0.11–7.98) | 0.27 | Yes | No |
NPTX2 | 2.83 (-1.09–6.76) | 0.29 | Yes | Yes |
CDH8 | 2.57 (-1.37–6.50) | 0.25 | Yes | Yes |
PTPRN2 | 2.26 (-1.82–6.34) | 0.24 | No | Yes |
IGF2 | 2.07 (-2.04–6.19) | 0.13 | No | Yes |
CHL1 | 1.62 (-2.52–5.76) | 0.26 | No | Yes |
CADM2 | 1.07 (-3.03–5.16) | 0.34 | No | Yes |
BASP1 | 0.67 (-3.50–4.84) | 0.18 | No | Yes |
IGFBP4 | -0.06 (-4.37–4.24) | 0.21 | Yes | Yes |
GAP43 | -1.25 (-5.41–2.91) | 0.23 | Yes | Yes |
NEFM | -2.58 (-6.92–1.77) | 0.22 | Yes | No |
SPP1 | -3.45 (-7.34–0.45) | 0.47 | Yes | Yes |
Protein pair |
PTPRN2/GAP43 | 7.48 (3.66–11.29) | 0.34 | | |
CDH8/GAP43 | 6.77 (2.68–10.85) | 0.32 | | |
CHL1/GAP43 | 6.49 (2.38–10.61) | 0.30 | | |
NPTX2/SPP1 | 5.24 (1.49–8.99) | 0.26 | | |
PTPRN2/BASP1 | 4.94 (0.66–9.22) | 0.19 | | |
CADM2/GAP43 | 3.5 (-0.63–7.63) | 0.22 | | |
NPTX2/IGFBP4 | 3.05 (-1.27–7.37) | 0.28 | | |
IGF2/IGFBP4 | 2.92 (-1.24–7.07) | 0.27 | | |
CI = confidence interval, CV R2 = 10-fold cross-validated R2 |
Protein ratios are superior to single proteins for predicting ECAS score
As protein pairs have been shown to provide stronger associations with cognitive function compared to single proteins in other neurodegenerative disorders, we also assessed protein ratios for their ability to detect cognitive impairment in ALS. All 47 proteins were combined into pairs (n = 2162). Again, we used elastic net to find the pairs most predictive of ECAS total score. In the elastic net model of ratios, the optimal hyperparameters were found to be α = 0.37 and λ = 10.38. Here, 8 protein pairs were identified as the most predictive of ECAS total score (Table 2, Fig. 2A). The model produced an RMSE of 12.93 on the training set and 12.11 on the test set.
Five of the proteins selected in the ratio elastic net model were also selected in the single protein model (NPTX2, GAP43, IGFBP4, SPP1 and CDH8). Interestingly, five additional proteins were found relevant as part of a pair, namely insulin like growth factor 2 (IGF2), cell adhesion molecule L1 like (CHL1), protein tyrosine phosphatase receptor type N2 (PTPRN2), cell adhesion molecule 2 (CADM2) and brain abundant membrane attached signal protein 1 (BASP1). As previously shown, neither of these had a strong association with ECAS total score alone.
To further evaluate the association to ECAS total score and compare the performance of single proteins and protein pairs, linear regression models were created for each candidate as the predictor (Table 2, Fig. 2A). We assessed the predictive performance of each model using 10-fold cross-validated R² (CV R²). Comparing the regression models revealed that higher CV R² values were generally observed in protein pair models (median 0.27) in contrast with those of the single protein models (median 0.24) (Supplementary Fig. 3). In particular, protein pair ratios involving GAP43 exhibited larger b coefficient magnitudes, the majority of the confidence intervals excluded zero, and higher cross-validated R² values relative to single protein metrics (Table 2). The PTPRN2/GAP43 ratio showed the strongest association with an estimated b coefficient of 7.48 (95% CI: 3.66–11.29) with a 10-fold cross-validated R² of 0.34. Other prominent protein pairs included CDH8/GAP43, CHL1/GAP43 and NPTX2/SPP1. The PTPRN2/GAP43 b coefficient substantially exceeded that of PTPRN2 and GAP43 alone (2.26, 95% CI: -1.82–6.34 and − 1.25 (95% CI: -5.41–2.91, respectively), suggesting that the ratio of these highly correlating proteins is more informative than the levels alone (Fig. 2B). This additive effect of PTPRN2 in combination with GAP43 was further explored using ROC analysis (Fig. 2C). The performance was considerably better for PTPRN2/GAP43, with an area under the curve of 0.89 (95% CI: 0.80–0.97), compared to PTPRN2 and GAP43 alone (AUC 0.76 95% CI: 0.64–0.89 and 0.78 95% CI: 0.67–0.90, respectively). These results suggest that protein ratios potentially offer a more robust prediction of ECAS total score compared to single protein measures. The consistency of key marker selection in both elastic net and linear regression analyses further reinforces the potential of these ratios as biomarkers for cognitive impairment in ALS.
We next evaluated the association of the most promising protein ratios with ECAS sub scores. The strongest associations with executive function, verbal fluency, and memory for the PTPRN2/GAP43 ratio (Supplementary Fig. 4). Similar results were found for the other candidate pairs suggesting that these ratios are more likely markers of a general cognitive dysfunction and not specific to ALS frontotemporal involvement.
The trajectories of CSF PTPRN2/GAP43 are different in males and females with cognitive impairment
The PTPRN2/GAP43 ratio was not associated with ALSFRS-R score, nor were there any statistically significant differences in the PTPRN2/GAP43 CSF ratio between bulbar versus spinal onset or C9orf72 mutation carriers versus non-carriers (Supplementary Fig. 5A, Supplementary Fig. 5B). In addition, we did not find any overall differences in PTPRN2/GAP43 ratio between the sexes (p = 0.16). However, among the patients with cognitive impairment, lower ratios were found in males compared to females even though the distribution of cognitive impairment and age was similar between sexes (p = 0.002) (Fig. 3A). Indeed, when exploring linear regression with PTPRN2/GAP43 as the outcome and including an interaction term between ECAS total score and sex, we found that the slope of PTPRN2/GAP43 ratio over ECAS total score was steeper for males than females (0.04 and 0.01 respectively, on a log2 scale) (Fig. 3B).