We conducted a systematic review and meta-analysis in accordance with the Preferred Reporting Items for Systematic Review and Meta-Analysis (PRISMA) guideline161. The full protocol, including search strategy using controlled vocabulary and keyword terms, is available in the International prospective register of systematic reviews (PROSPERO) with reference CRD42022327151. The patient, intervention, comparator, outcome (PICO) approach is reported in Supplementary Table 15.
Article Selection
We included original peer-reviewed studies, including journal articles and conference papers, published in any language starting 1st January 2000, to focus on studies based on neurophysiological tools and a clinical taxonomy in line to the current standard (i.e., after the proposal of diagnostic criteria for both UWS/VS162 and MCS163). To avoid excluding a large portion of the literature12, we included studies on DoC patients of any etiology with a diagnosis of UWS/VS or MCS based on a validated assessment scale164, where more than 50% of the participants were i) adults (≥16 years old) and ii) in a pDoC (≥28 days post-injury). We included studies of neurophysiological assessments under resting state, task-free conditions, based on EEG, MEG, fNIRS, [18F]FDG-, [15O]H2O- or other tracer-PET or -SPECT and structural, perfusion or functional MRI. We included studies comparing patients with pDOC, UWS/VS or MCS against healthy volunteers and against each other. PET, SPECT and MRI studies were included if they ran voxel-based comparisons in the whole-brain gray or white matter, without applying partial volume correction. EEG, MEG and fNIRS studies were included if they ran comparison of global features of neural activity, obtained from electrodes placed across the entire scalp.
Information sources and search strategy
Studies were searched primarily based on bibliographical databases. Other sources complemented this approach, namely (i) expert recommendations and (ii) search in bibliography of previously published literature reviews and meta-analyses. The databases MEDLINE via Ovid, Scopus and EMBASE via Elsevier were searched on 14th January 2022. An update was run on 16th June 2023. The search strategy focuses on concepts and keywords for disorders of consciousness and neurophysiological tools. The search strategy was developed with the support of a health sciences information specialist (MB) as recommended by the Cochrane Handbook for Systematic Reviews of Interventions165. The complete search strategy used for each database is listed in Supplementary Tables 16-18. The list of studies selected from the database search (see Selection process) was submitted to pDoC experts (OG, AT) to get recommendations of additional potentially eligible studies; the same list was also compared against the bibliographies of previously published literature reviews and meta-analyses on disorders of consciousness (by BK, SA, ZW, AS) to identify additional potentially eligible studies.
Selection process
The screening process was carried out using the platform Covidence166. Titles and abstracts of studies retrieved using the search strategy were reviewed manually (screening step #1). Studies passing the screening step #1 were reviewed manually based on full text and supplementary materials (screening step #2). Screening #1 and #2 was performed by at least two independent referees, among AS, MM, NA, SA, DS, BK, ZW and JA. Full-texts were automatically retrieved using Covidence and Zotero (https://www.zotero.org/); supplementary materials were automatically retrieved using the R package suppdata (https://github.com/ropensci/suppdata). Full-texts in non-English language (n=29) were translated using DeepL and a co-author fluent in Chinese (ZW, n=18), Russian (NB, n=6), German (AS, n=2) or Italian (AS, n=1); one Japanese (with tables and figures in English) and one Polish full-text were translated using DeepL only. Missing full-texts and/or information required for the screening process were requested from the study authors via email, phone and/or social media, in English or in the authors’ mother tongue. If unrecoverable, information relative to inclusion criteria, and in particular to the proportion of adult (≥16 years old) or pDoC (≥28 days post-injury) patients in a given study, were estimated based on reported means and standard deviations, assuming a Gaussian distribution (n=3 with estimated proportion of pediatric DoC patients below 50% [range: 0-10%]53,63,106; n=2 studies with estimated proportion of acute or subacute DoC patient below 50% [range: 3-6%]26,63). Conflicts were resolved through discussion; a third referee (AS or JA) was brought into the discussion when necessary.
Data items extracted
The extraction process was carried out using the platform Covidence. A standardized form was used to extract data from the full text and supplementary materials passing screening step #2. Full-texts in non-English language (Chinese, n=4) were translated using DeepL and a co-author fluent in Chinese (ZW).
For EEG, MEG and fNIRS studies, we extracted the mean and standard deviations of any resting-state EEG, MEG or fNIRS global measure for pDoC, UWS/VS, MCS and healthy volunteers. If no data were available, they were requested from the study authors, as detailed above (n=67 studies, of which n=36 shared usable data). Whereby authors shared individual and/or electrode or source level data (n=14 studies), we computed the mean and standard deviation of the measures of interest across subjects, after averaging over channels/sources when necessary. All data received from the authors were inspected by MM and compared to the results reported in the original studies. Discrepancies were discussed with the authors, and if they could not be resolved, the data were excluded (n=3 studies). In case of no response, we pursued several strategies: (i) in case of data available in subgroups, we computed the combined mean and standard deviation of the pooled group of interest based on Cochrane’s formulas165 (n=30 studies); (ii) in case the median, interquartile range, minimum/maximum and/or standard error were available, we estimated the mean and standard deviation based on the formulas by Wan and colleagues167 (n=0 studies); (iii) in case sufficient information was available in published plots, we used PlotDigitizer (https://plotdigitizer.com/) (n=13 studies) to extract the mean and standard deviation of groups (n=2 studies) and/or subgroups of interest (see strategy i, n=9 studies), or the individual values (n=2 studies), and/or the median, interquartile range, minimum/maximum and/or standard error (n=4 studies), from the high-resolution figures published in the full text or supplementary materials. The provenance of the statistics for each included study is detailed in Supplementary Table 19.
For PET, SPECT and MRI studies, we extracted the MNI or Talairach x, y and z peak coordinates of significant voxel-wise differences in resting-state PET, SPECT or MRI measures, between pDOC, UWS/VS or MCS against healthy volunteers and against each other. If no coordinates were reported, they were requested from the study authors (n=20 studies, of which n=13 shared usable data), as detailed above. If authors shared data in the form of thresholded statistical maps (n=4 studies), we used the SPM12 spm_max function to extract peak coordinates, with default settings, i.e., three coordinates at least 8 mm apart extracted from each significant cluster. Whereby authors shared unthresholded statistical maps (n=2 studies), an intensity-based and cluster-extent based threshold was applied, as per the original study. If no cluster-extent based threshold was explicitly reported, we applied a cluster-extent based threshold of 100 voxels to reduce risk of false positives and decrease the noise in the meta-analytical input. All data received from the authors were inspected by AS and compared to the results reported in the original studies. Discrepancies were discussed with the authors, and if they could not be solved, the data were excluded (n=0 studies). Tailarach coordinates were converted to MNI space (n=4 studies). The provenance of the statistics for each included study is detailed in Supplementary Table 20 and 21.
For all studies, we extracted information on overlap with previous studies and quality of evidence. We also extracted information relative to participants number, demographics (age, sex), clinics (diagnostic procedure and diagnosis, etiology and disease duration), data acquisition (time, center, neurophysiological tools and participants’ set-up, including sedation), data processing and quantification, and statistical comparisons and thresholding. Summary descriptive statistics for demographic and clinical information were computed as cumulative frequencies for qualitative variables, and mean and standard deviation of the cumulative Gaussian probability density function for quantitative variables (in-house code will be made available on GitHub, https://github.com/GIGA-Consciousness). The Gaussian probability density functions were estimated based on mean, standard deviation and minimum and maximum of each study; if minimum and maximum were not available, they were estimated based on the study mean ± 3*standard deviation (capped to the extreme minimum and maximum reported in the remaining studies). Statistical comparisons relative to qualitative variables were carried out based on Pearson’s Chi-square test. Statistical comparisons relative to quantitative variables were carried out based on two-sample t-tests, after testing for equality of variances based on the F-test. Effect size was computed based on Hedges’ g. All descriptive statistics and statistical tests were computed using the SciPy package in Python 3.12.
Quality of evidence, risk of bias and heterogeneity
Quality of evidence was evaluated based on a modified version of the NOS for case control studies, covering bias in selection, comparability and exposure (Supplementary Table 1).
For EEG, MEG and fNIRS studies, we evaluated heterogeneity based on the Cochran’s Q test and the I² statistic. Publication bias due to missing results was evaluated whenever appropriate based on funnel plots and the Pustejovsky and Rodgers’ modified version of the Egger’s test109 (for comparisons based on at least ten studies165), using the R package meta. We ran the modified version instead of the original as the latter has been shown to be more prone to Type I errors168. The Trim and Fill method was applied to assess the influence of publication bias on the pooled effect size.
For PET, SPECT and MRI studies, we evaluated heterogeneity in each meta-analytical cluster, based on an analysis of contributions, where the average non-linear contribution of each experiment is tested via a jack-knife approach. Risk of bias due to missing results was not evaluated, as the latter is designed for meta-analysis of effect sizes rather than of spatial consistency, where the research question is whether an effect is present, rather than where it is present.
Meta-analysis of global electrophysiological features
Meta-analysis of EEG, MEG and fNIRS findings was performed to evaluate the magnitude of the effect of neurophysiological findings in studies of pDOC, UWS/VS, MCS and/or healthy volunteers. Because heterogeneity was a priori expected, we used a random-effects meta-analysis via the R package meta. Classical inverse variance random-effects meta-analyses were applied with restricted maximum likelihood tau estimator, as per default settings. Effect sizes were computed using Hedges’ g, corresponding to the mean difference divided by the pooled and weighted standard deviation. Results were deemed significant at p<0.05.
Mean and standard deviation of any EEG, MEG and fNIRS features were included in separate meta-analyses, provided the measure of interest was computed at global level (i.e., based on signal of usable electrodes placed across the entire scalp). Global features were grouped together into subfamilies of analogous EEG features, belonging to six families (power spectral density, connectivity, graph theory, microstates, entropy and complexity) and six bands (delta, theta, alpha, beta, gamma or broadband). The power spectral density family included features describing the distribution of the signal’s frequency contents; the connectivity family included features of phase and/or amplitude synchronization across brain locations; the graph theory family, split in different subfamilies, each including features reflecting a specific aspect of connectivity-derived brain network organization; the microstate family included features related to transient, quasi-stable topographical patterns of brain activity; the entropy family included features quantifying the unpredictability of the signal, and the complexity family included features relative to the degree of organization and amount of information necessary to describe the signal (Supplementary Table 2).
The primary comparisons of interest were pDoC patients against healthy controls, and UWS/VS patients against MCS patients. Secondary comparisons of interest were UWS patients against healthy controls, MCS patients against healthy controls and MCS subgroups (MCS-/MCS+) against any other group. We ran a meta-analysis for each individual subfamily of global features for which at least five different studies in independent clinical samples were available, according to best practices108. This resulted in meta-analysis of at least one subfamily of interest for each primary and secondary comparison of interest, with the exception of comparisons with MCS subgroups, for which not enough studies were available. Known overlaps in clinical samples within the same studies (i.e., in case of multi-measure studies) and across studies (i.e., in case of repeated inclusion of the same patients) were dealt with by excluding overlapping studies and testing robustness of the results by means of sensitivity analyses. Sensitivity analyses were run to evaluate the robustness of the meta-analysis results to inclusion/exclusion of (i) specific EEG sub-bands (when data on multiple sub-bands were reported in the same patients), (ii) specific EEG features (when data on multiple EEG, MEG and fNIRS features of the same family were reported in the same patients), (iii) markers obtained from imputation (e.g., mean obtained from the median; mean and standard deviation computed from extraction of data from published charts), (iv) clinical subpopulation with suspected covert consciousness and (v) presence of outliers (by leave-one-out sensitivity analysis), defined based on visual evaluation of the forest plots.
Coordinate-based meta-analysis of functional, structural and molecular neuroimaging findings
Coordinate-based meta-analysis of PET, SPECT and MRI findings was performed to evaluate the spatial consistency of neuroimaging findings in studies comparing pDOC, UWS/VS or MCS against healthy volunteers and against each other, using a random‐effects analysis of convergence over experiments16,169, i.e., comparisons of interest for which at least a coordinate is reported by a given study. Coordinate-based meta-analysis was performed via activation likelihood estimation, by means of in-house MATLAB code. In ALE, coordinates (also called foci) in 3-dimensional MNI or Talairach stereotactic space, obtained from different studies, are spatially normalized to a single template and smoothed with a Gaussian kernel to account for spatial uncertainty. The smoothing kernel dimensions are determined by the sample size of the experiment. The activation likelihood of each voxel is computed based on the union of the smoothed values, indicating the probability that at least one of ‘true’ peak activations lies within this voxel16,169. Results were deemed significant at a cluster-level p<0.05 FWE-corrected statistical threshold and a voxel-level uncorrected p<0.001 cluster-forming threshold.
Coordinates derived from comparison of any PET, SPECT and MRI measure were included in the meta-analysis, provided the comparisons were run at whole-brain level (i.e., including the telencephalon at the minimum, whereby brainstem and cerebellum might not have been included).
We considered both comparisons of decreased cerebral integrity (e.g., decreased gray matter density, metabolism, connectivity in patients compared to healthy volunteers) or relatively preserved cerebral integrity. The primary comparisons of interest were decreases/relative preservation of cerebral integrity in pDoC patients against healthy controls, and UWS/VS patients against MCS patients. Secondary comparisons of interest were decreases/relative preservation of cerebral integrity in UWS/VS patients against healthy controls, MCS patients against healthy controls and in subgroups of MCS (MCS-/MCS+) against any other group. We ran coordinate-based meta-analysis for comparisons for which at least 20 experiments were available, according to best practices111. This resulted in a meta-analysis for the primary comparison of loss of cerebral integrity in pDoC patients against healthy controls. We further report exploratory results of coordinate-based meta-analysis for comparisons for which at least 10 experiments were available170. This resulted in exploratory meta-analyses of relative preservation of cerebral integrity in pDoC patients against healthy controls and decreased cerebral integrity in UWS/VS against MCS patients (primary comparisons), and in UWS/VS and MCS patients, respectively, against healthy controls (secondary comparisons). An insufficient number of studies was available for running meta-analyses of MCS- and MCS+ subgroups. Known overlaps in clinical samples within the same studies (i.e., in case of multimodal studies) and across studies (i.e., in case of repeated inclusion of the same patients) were treated by pooling the coordinates within the same experiment. Tags were included in the ALE input to evaluate the contribution of specific clinical sub-populations (UWS/VS, MCS) and imaging modalities (PET/SPECT, sMRI, fMRI) to the results. Sensitivity analyses were performed on unpooled coordinates to evaluate the impact of sample overlap on the main results via contribution analysis, for coordinates of any imaging modality and separately for coordinates of each imaging modality. We characterize the topography of meta-analytical findings using the REX toolbox (https://web.mit.edu/swg/software.htm), and (i) the Consensual Atlas of REsting-state Networks (CAREN)171, (ii) the Human Connectome project multi-modal cortical parcellation 1.0 (HCP-MMPI) atlas172, (iii) the Morel histological atlas of the human thalamus173 and (iv) the 7-subdivision PET-MRI probabilistic atlas of the striatum134.
Spatial correlation with an independent panel of multi-level neurobiological data
Following a similar rationale to the one recently proposed by174, we relied on an unthresholded voxel-wise map of Z-statistics generated by ALE for the primary comparisons of interest and assessed their spatial correlation with an independent panel of multi-level neurobiological data, consisting of 65 different brain maps. Neurobiological brain maps were selected from the Neuromaps toolbox database (https://netneurolab.github.io/neuromaps). Only brain maps with data available for both hemispheres and generated based on data in at least n=10 observations were included; cortical surface maps were excluded when the same data were already available in the form of a whole-brain volume; whole-brain volumes were excluded when based on low-resolution imaging data (SPECT). This resulted in a final database of 28 cortical surfaces of microstructure, functional MRI connectivity, electrophysiology, developmental and evolutionary expansion and functional organization deriving from a total of 2,592 observations, and 37 whole-brain volumes of neural activity, neurotransmission, microglia and functional organization deriving from a total of 2,623 observations (not counting the functional organization derived from the Neurosynth meta-analytical database of 14,371 studies).
We performed spatial transformation, spatial correlation and significance testing based on comparison to a spatial null distribution using the Neuromaps toolbox (version 0.0.5) running on the GIGA high-performance computing (HPC) system (https://giga-bioinfo.gitlabpages.uliege.be/docs/mass-storage-and-cluster/cluster/overview.html)175. Due to the extreme computational load of some of the algorithms used for computation of spatial nulls for brain volumes, we down-sampled the unthresholded ALE voxel-wise map of Z-statistics (source map) from a spatial resolution of 2 mm to 3 mm. The procedure ensures computational sustainability without affecting the results of the statistical comparisons neither for surface nor for volume target maps, as demonstrated by the almost perfect correlation (Pearson’s R≈1) between p-values obtained from the source map of 2 mm vs. 3 mm (data available upon request). The source map was then transformed into the native space of each of the 65 multimodal brain maps (target maps), to which it was correlated. Significance of Pearson’s correlations was (spin-)tested against two-sided spatial autocorrelation-preserving null models. Results were considered significant at p<0.05, Bonferroni-corrected for multiple comparisons within map-type. For surface target maps, we generated spatial nulls by means of the Alexander-Bloch method, that generates spatially-constrained null distributions by applying random rotations to spherical projects of the brain176. For volume target maps, we generated spatial nulls by means of the Burt method, that generates spatially-constrained null distributions by source-to-nulls variogram-matching, in order to retain the spatial autocorrelation of the original source map177. We used an optimized knn parameter (knn=800), determined via visual inspection of the fit between source and nulls variograms generated with knn parameters in the range of 500 to 16,000 using BrainSMASH 0.11.0177. Because of a possibly higher likelihood of false positives shown by the Burt method (demonstrated for parcellated surface data by Markello and colleagues178), we cross-validated our findings by generating spatial nulls by means of the Moran method179, that generates spatially-constrained null distributions by using a spatial eigenvector as an estimate of spatial autocorrelation. For each map and method, we generated a minimum of 1,000 null maps, which were then correlated with the source maps to provide a null distribution of correlation coefficients, and estimated a two-tailed p-value for the original correlation. The exact number of null maps generated for each map and method was defined by first estimating the two-tailed p-value based on 1,000 null maps and its 95% Wilson confidence interval based on the binomial distributions180. If the target p-value fell within this 95% confidence interval, a more precise estimation was performed based on >1,000 maps. The number of null maps used finally varied from 1,000 to 50,000, depending on map and method. This procedure allows for a flexible, efficient use of computational resources, while ensuring a robust assessment of statistical significance in statistical frameworks where null distributions are randomly generated.