Classification of spatial patterns of lymphocyte infiltration in gliomas from whole slide imaging

doi:10.21203/rs.3.rs-7631009/v1

Download PDF

Research Article

Classification of spatial patterns of lymphocyte infiltration in gliomas from whole slide imaging

https://doi.org/10.21203/rs.3.rs-7631009/v1

This work is licensed under a CC BY 4.0 License

You are reading this latest preprint version

Primary malignant brain tumors are among the most lethal cancers and are associated with poor survival. T cells are crucial components of the immune response against tumors. However, the spatial organization of T cells in brain tumors and their potential association with outcomes is poorly understood. In this study, we investigated the spatial distribution of T cells in human gliomas on microscopic images obtained after immunostaining the CD3 protein (T cell marker) on tumor tissue sections. First, recurrent, distinct, infiltration patterns of CD3 positive (CD3+) T cells were manually annotated by an expert pathologist. To predict these patterns, we implemented a two-step strategy. In the first step, we aimed to distinguish microscopic images with or without CD3 + T cells using two input types. A 2D convolutional neural network (CNN) was trained on density maps derived from CD3 + segmentation, while an XGBoost model was applied to features extracted by a VGG16-pretrained network. Both models performed well, achieving accuracy greater than 0.8. In the second step, we analyzed spatial patterns of lymphocyte aggregation on image patches. The lymphocyte presence and their aggregation types were further classified into two classes (T cell aggregation close to blood vessels versus diffuse infiltration of tumor tissue) with a final combined global accuracy of 0.81. These results demonstrate that the proposed novel approach allows for automated classification of tumor samples according to spatial patterns of T cell infiltration. This will enable future investigations of biological mechanisms underlying distinct immune responses and their potential association with outcomes.

Primary brain tumors

cd3 immunostaining

lymphocyte diffusion

t-cell infiltration

image classification

spatial pattern recognition

machine learning.

Primary brain tumors are rare but contribute significantly to disability and mortality. The incidence of malignant primary brain tumors is estimated to approximately 7 per 100,000 individuals annually, increasing with age (Schaff & Mellinghoff, 2023). Glioblastoma (GBM) is the most common and aggressive adult primary brain tumor, representing approximately 50% of malignant cases, while infiltrating lower-grade gliomas account for 30% (Boylan et al., 2023). Despite intensive treatment including surgery, radiotherapy, and temozolomide chemotherapy, GBM patients have a median survival of 1.5 years, and only 5% survive beyond five years (Stupp et al., 2005). MRI remains the primary diagnostic tool for longitudinal monitoring, associated with biopsy or resection for histomolecular evaluation at diagnosis or relapse. Recent investigations suggest that the immunosuppressive microenvironment of GBM with low T cell infiltration may be in part responsible for the poor outcomes and modest response from immunotherapy. To enable further investigation of the role of T cells, a key aspect of the anti-tumor immune response, we used whole-slide images (WSI) with CD3 staining (a T cell marker) and machine learning algorithms to more systematically characterize the spatial patterns of T cell infiltration in GBM.

WSI consists of high-resolution digital scans of histological slides, allowing for the spatial analysis of cellular architecture in tumor tissues. Due to their size (1–2 GB per image), WSIs are computationally intensive to process. A common strategy involves dividing WSIs into smaller patches for manageable analysis, enabling applications in cancer detection, classification, and prognosis. For instance, Pataki and coauthors (Pataki et al., 2022) processed colorectal cancer (CRC) WSIs into 512×512 patches annotated for ten pathological conditions (low-grade dysplasia, high-grade dysplasia, adenocarcinoma, suspicious for invasion, inflammation, resection edge, tumor necrosis, lymph vascular invasion, artifact and normal). For the prediction process, they applied a ResNet50 model that achieved AUC from 0.73 to 0.98. However, manual annotation remains a bottleneck. Neto and coauthors (Neto et al., 2024) addressed this by proposing weak labeling strategies to train models with fewer annotations, achieving 0.93 accuracy and 0.996 sensitivity between positive (low-grade and high-grade dysplasia) and non-neoplastic samples on a large CRC WSI dataset. Similarly, Chen and authors (C.-L. Chen et al., 2021) introduced a WSI training method to bypass patch-level annotation, preserving performance while reducing the labeling burden.

In recent years, computational pathology has made significant strides in the analysis of brain tumors particularly gliomas and glioblastoma using deep and weakly supervised learning on whole slide images. Chen and colleagues (C. Chen et al., 2022) laid foundational work by introducing a self-supervised method for large scale WSI retrieval, achieving over 0.90 top 5 retrieval accuracy, offering a powerful tool for exploring histological patterns across brain tumor cohorts without needing manual annotations. Building on this, Hsu and colleagues (Hsu et al., 2022) integrated WSIs with multiparametric MRI in a weakly supervised model to classify glioma subtypes, reporting AUC above 0.90 for critical biomarkers like IDH mutation and 1p/19q codeletion, highlighting the potential of multimodal learning in neuro oncology. Baheti and coauthors (Baheti et al., 2024) approached glioblastoma from a different angle unsupervised clustering of WSI derived morphology patterns which revealed patient subgroups with different survival outcomes (p < 0.01), suggesting that the tumor microarchitecture alone can stratify prognosis. In parallel, Li and coauthors applied Vision Transformers to WSIs of primary brain tumors, achieving classification accuracies up to 0.93 while also providing attention based heatmaps to interpret the model (Li et al., 2023).

Advanced methods now integrate segmentation to further improve WSI analysis. For instance, Khened and coauthors developed a segmentation and patch-based pipeline for liver cancer analysis, validated through challenges like CAMELYON and PAIP (Khened et al., 2021). Then, Saltz and coauthors (Saltz et al., 2018) analyzed tumor-infiltrating lymphocytes (TILs) in Hematoxylin and Eosin (H&E) stained The Cancer Genome Atlas (TCGA) slides using convolutional neural network (CNN), linking spatial TIL patterns to tumor subtypes and outcomes.

These strategies laid the groundwork for our study, in which we employed available tools to accurately segment nuclei and CD3-positive T cells in GBM. This enabled the study of T cell infiltration patterns across tumors. In GBM, the tumor microenvironment comprises immune cells, neurons, blood vessels, and biochemical factors influencing progression and therapy response (Sharma et al., 2023). Understanding the lymphocyte behavior remains a key to enhance therapeutic efficacy in brain tumors.

As classifications based on T cell infiltration are being developed to predict outcomes and response to therapy in other cancers (El Sissy et al., 2024; Ghiringhelli et al., 2023; Mlecnik et al., 2023), our study aimed to support pathologists and oncologists by developing a novel classifier based on small or medium deep learning models to recognize T cell infiltration patterns in GBM from WSIs. Over 200 glioma cases were selected and digitalized at the Paris Brain Institute, with experts assigning global scores reflecting T cell spatial diffusion. Unlike previous approaches, our method leverages precise CD3-positive T cell segmentation masks to train models that predict global scores and analyze infiltration patterns within the tumor microenvironment. We implemented a two-step pipeline: first, distinguishing WSIs with or without CD3 + T cells (global score A vs non-A) using a 2D CNN trained on density maps and an XGBoost model (using decision trees with gradient boosting) trained on VGG16-derived features, both achieving over 0.8 accuracy. Secondly, we analyzed spatial grouping of lymphocytes at the patch level, classifying aggregation types with similar accuracy (global scores B vs D). This approach enhances understanding of T cell infiltration in gliomas and holds promise for future prognostic applications.

The data and methodological approach used in this study are summarized in Fig. 1 and Fig. 2. The first step consists in classifying pathologist-defined scores A versus non-A, in terms of global scores. The second step aims at refining the classification of the predicted non-A into B and D scores using subcluster annotations.

Images of Brain Tumor Samples, Annotations and Segmentation

Brain tumor samples were obtained from the Onconeurotek collection of the Pitié-Salpêtrière Hospital. The protocol was approved by the IRB (Onconeurotek 2.0). Tumor specimens and clinicopathological information were collected with informed consent and approval by an Institutional Review Board (N° IDRCB: 2023-A02763-42, NCT06314607) in accordance with the national laws and the Declaration of Helsinki. The dataset includes diverse glioma types as defined by the reference classification of the World Health Organization (WHO) 2021: oligodendroglioma IDH-mutant and 1p/19q codeleted, astrocytoma IDH-mutant, glioblastoma IDH-wildtype, pleomorphic xanthoastrocytoma, ganglioglioma, and other rare variants. Tumor tissue sections were immunostained for the T cell marker CD3 with an automated stainer (Ultra, Roche Ventana), and scanned using a Zeiss Axioscan Z1 at 0.22 microns, generating CZI WSI images (2–3 GB per image). The total dataset comprised 214 images.

Neuropathologists annotated all images for tumor infiltration and peritumor zones and categorized them into four groups (A, B, C, D) based on the amount and pattern of T cell infiltration, referred to as the global score of the image. The groups were defined as follows (Fig. 1): global score A: absence or presence of few CD3-positive cells within the tumor, as defined by neuropathologists. Global score B: presence of CD3-positive cells primarily surrounding blood vessels inside the tumor. Global score C: presence of CD3-positive cells primarily outside the tumor infiltration zone. Global score D: diffuse infiltration of CD3-positive cells within the tumor. Cases with a global score C were excluded from the dataset as CD3-positive cells were mainly out of the tumor zone. The final dataset consisted of 134 WSIs with global score A, 60 WSIs with global score B and 20 with global score D. In the two-step strategy, the first goal was to distinguish between the global score A vs non-A (B and D) given the imbalance in the dataset. Data split into training, validation and test datasets was performed once and was the same for both the models (XGBoost and 2D CNN). Although the dataset was initially split once, it was later partitioned using a 10-fold split approach, resulting in ten separate subsets, each containing its own training, validation, and test sets for cross-validation analysis.

Visiopharm® was used to perform manual segmentation of tumor zones and automatically segment the nuclei, and CD3 + cells within tumor areas. The output XML files contained several object types: ROI (Type 1), CD3-positive cells (Type 4), CD3-positive cells nuclei (Type 5), and negative zones (Type 0). Types 0, 1, and 4 were parsed using 'xmltodict' (v0.13.0). Lymphocyte coordinates were extracted and their centers computed. Using Matplotlib (v3.9.0), we visualized CD3-positive cells centers and tumor zones, excluding CD3-positive cells from non-tumor regions. To normalize tumor sizes, we defined consistent global boundaries across all cases based on coordinate ranges. Each full mask had a size of 7200×7200 pixels, in two versions: with or without the tumor area (type 1).

Patch-based analysis

To cope with the lymphocyte infiltration details of the images, full masks were patchified into 50x50-pixel tiles using Python (python package patchify). Patches with less than 25% white pixels (i.e. without lymphocytes) were retained. In total, 93,214 patches were created from 214 full masks.

VGG16 (Keras v2.11.0) was used for feature extraction (Fig. 2). It is a pretrained neural network, which outputs 512 features, that were then used as input to K-means clustering algorithm (Sklearn v1.3.0) to group patches into 75 clusters. The method clustered visually similar image patches and integrated these clusters into a classification model. VGG-16 was selected for feature extraction due to evidence supporting its accuracy in brain tumor classification (Srinivas et al., 2022). Chen and coauthors used a fusion of ResNet101, DenseNet121, and EfficientNetB0, achieving 0.9918 accuracy (W. Chen et al., 2024), while Saeedi et al. applied deep learning for early tumor detection (Saeedi et al., 2023). Beyond brain tumors, VGG-16 has been used in breast cancer detection with principal component analysis (PCA) for dimensionality reduction (Alrubaie et al., 2023), and in pancreatic cancer detection combined with XGBoost (Bakasa & Viriri, 2023), highlighting its effectiveness in diverse medical imaging tasks.

In our study, after obtaining 75 clusters, we counted the number of patches assigned to each cluster within a given mask image. Next, we calculated the proportion of patches in each cluster relative to the total number of patches in that mask. In this way, each mask image was represented by a percentage distribution across clusters (see Table 1 in the supplementary material). This compact representation was paired with the corresponding global score and effectively preserved biologically relevant spatial information related to the diffusion patterns of CD3 positive cells.

To search for the best method, H2O AutoML (https://docs.h2o.ai/h2o/latest-stable/h2o-docs/automl.html) was run for 24 hours using 5-fold cross-validation, class balancing, and accuracy as the main evaluation measure. The best model (ranked by AUC), a XGBoost model, was selected from more than 1000 candidates. XGBoost constructs an ensemble of decision trees in a sequential manner, where each new tree corrects the errors of the previous ones using gradient descent on a loss function. It incorporates regularization, shrinkage, and column subsampling to improve generalization and prevent overfitting.

To estimate the global score, a patch-based method was developed using features extracted from VGG16, followed by clustering and classification with this XGBoost model. This model was trained to distinguish WSIs classified as Group A from non-A (Groups B and D combined). The selected model used XGBoost with “Dropouts meet Multiple Additive Regression Trees” (DART) algorithm for binary classification, with 37 trees (depth = 15) and 5-fold cross-validation.

Density Map Analysis

A second approach relied on the density of T cells. Grayscale density maps were created to capture T cell density. CD3-positive cells full masks (without the border line of the tumoral zone) were divided into overlapping 100x100-pixel windows (50-pixel stride). The mean lymphocyte density was computed in each window and represented as grayscale maps. These maps were then used in a classifier (Fig. 2).

To estimate the global score in a different manner, a density-map-based approach was implemented using a custom 2D CNN architecture. This model also performed Group A vs non-A classification. This 2D CNN (implemented with TensorFlow) had 4 Conv2D layers (3x3, ReLU), MaxPooling2D (2x2), Flatten, Dense (256, ReLU), and final Dense (1, sigmoid) layers. It was trained over 3000 epochs with batch size 15 and learning rate decay from 10^− 4 to 10^− 6 over 3000 steps. Finally, the model has 20.18M trainable parameters.

Grad-CAM visualizations were generated using the final convolutional layer of the CNN model to highlight regions contributing to the classification (Chattopadhyay et al., 2018; Selvaraju et al., 2020).

Spatial Analysis based on Labeled Patterns

To differentiate between different types of lymphocyte aggregation (corresponding to global scores B and D), expert pathologists annotated the patch clusters (Fig. 3). A total of 75 clusters were classified into six subclusters (1, 2, 3, 4, E, and O), based on tissue morphology and CD3 positive lymphocyte density. For each cluster, 25 random patches were selected to determine its label, amounting to approximately 2% of the entire dataset (75 x 25 / 93,214). This is our weak labelling process of patches. Subclusters 1 to 4 reflected increasing levels of T cell infiltration, with subcluster 4 indicating the highest diffusion among the tumor tissue. Subcluster E (for Exclusion) corresponded to patches where T cells were densely packed near blood vessels without evidence of infiltration. Subcluster O included background patches with no relevant tissue or cellular information.

A pattern was defined as a 3x3 array of patches, i.e. a pattern was composed of 9 patches, with their corresponding label (one of each subcluster). This array inherently contains the spatial organization of the patches. However, knowing that a specific subcluster label comes after or before another patch in the mask image is irrelevant for our analysis. Therefore, we rearranged the labels in alphabetic order (see Fig. 4). We observed that there are unique patterns of sub-cluster groups per global score group. Thus, we hypothesized that the presence or absence of these distinct and unique patterns could allow us to distinguish between the global score groups B and D given the ability to distinguish between A and non-A. Following this, we analyzed the top 50 most frequent patterns per group in order to see if there were specific patterns of sub-cluster groups describing the global score groups B and D. Patterns were eliminated if background (represented by O) or absence of T cells in tumoral zone (represented by subcluster 1) were found for more than or equal to 5 patches in the array. For each global score group, we selected the top 50 most frequent patterns as a reference. This approach resulted in having three reference groups (A, B, D). To classify a query case, we compared its patterns to the reference patterns of each group and counted the common patterns. Even though we identified the patterns describing the global score A, these patterns had to be ignored at this stage as classifying A vs non-A was already achieved. The package R ggseglogo (https://cran.r-project.org/web/packages/ggseqlogo/index.html) was used to visualize the top 50 most frequent patterns in the reference groups.

Comparable performance of XGBoost and of 2D CNN models: Classification of A vs nonA

Table 1. Evaluation of two models: XGBoost and 2D CNN classifying global score A vs nonA (B & D)

Metrics	Training		Validation		Test
Metrics	XGBoost	2D CNN	XGBoost	2D CNN	XGBoost	2D CNN
Accuracy	0.97	0.87	0.97	0.91	0.82	0.85
Precision	0.98	0.93	1.00	1.00	0.89	0.94
Recall	0.98	0.85	0.95	0.85	0.81	0.81
F1-score	0.98	0.89	0.97	0.92	0.85	0.87
Specificity	0.96	0.89	1.00	1.00	0.83	0.92

Table 2. XGBoost: Confusion Matrix: Test dataset. All misclassified cases were studied in detail using GradCam methodology. Especially, we observed 2 misclassified cases for nonA from XGBoost

	A	nonA
A	17	4
nonA	2	10

Table 3. 2D CNN: Confusion Matrix: Test dataset. All misclassified cases were studied in detail using GradCam methodology. They are the same cases for both the models. Especially, we observed 2 misclassified cases for nonA from XGBoost while 2DCNN model misclassified only one nonA case.

	A	nonA
A	17	4
nonA	1	11

The performance comparison between XGBoost and 2D CNN reveals distinct learning behaviors. XGBoost achieves near-perfect results on the training data (accuracy and F1 > 0.97), suggesting potential overfitting, especially with a test accuracy drop to 0.82 (Table 1, 2, 3). By contrast, 2D CNN trains at a lower accuracy (0.87), indicating better generalization. On validation data, both models perform similarly, though XGBoost often shows higher precision, recall, and specificity. On the test set, 2D CNN slightly outperforms XGBoost in accuracy (0.85 vs. 0.82), precision, specificity, and F1 score, though it has slightly lower recall for class A (0.81 vs. 0.89), see Figure 5. For class nonA, 2D CNN achieves better recall and F1 on test data, showing stronger generalization, especially under class imbalance. The sharp drop from training to test results highlights its sensitivity to data distribution. This overfitting limits its reliability on unseen or real-world data. By contrast, 2D CNN offers more stable and robust performance, making it more suitable for deployment in practical applications. Given the small test set, minor prediction changes can significantly impact accuracy. A 10-fold split analysis, however, shows that XGBoost has a higher accuracy (0.93 ± 0.05) on the test data than 2D CNN (0.83 ± 0.08), suggesting more consistent performance across splits despite its overfitting tendency.

Density emerges as the most influential variable across both models

We conducted an analysis of variable importance using XGBoost, focusing on a selection of 12 patch cluster descriptors. These descriptors accounted for more than 90% of the cumulative variance explained in the model (Figure 6), highlighting their relevance to the predictive performance. The selected descriptors were further divided into two groups based on their average density. The first group, consisting of six patch clusters, exhibited an average density in the range [0.4, 1.0], while the second group of six clusters had an average density between 0 and 0.39 (Figure 6).

Notably, group non-A was found to contain a greater proportion of descriptors from the high-density group (i.e., with an average density more than 0.34), whereas group A included more descriptors from the lower-density group (i.e., with average density less than or equal to 0.34). High density patch clusters included patch clusters 3, 11, 55, 47, 28 and 5. On the other hand, low density patch clusters were composed of patch clusters 69, 49, 50, 45, 1, and 12. This distribution suggested distinct underlying patterns in the data between the two groups, potentially linked to varying structural or spatial properties represented by the patch clusters. Our 2D CNN model was analyzed using Grad-CAM to visualize the importance of pixels in density maps. Regions with low intense coloration indicate areas with a lower density of CD3+ T cells. These cells were depicted in dark blue and white. The orange and red regions, which had a significant influence on the model predictions, corresponded to areas with higher densities of CD3+ T cells (Figures 7 and 8).

Wrongly predicted cases

Figures 7 and 8 show the wrongly predicted cases issued from our test dataset by XGBoost model and 2D CNN model. Figure 7(A) displays all wrongly predicted global score A cases. Here, we observed an increased presence of lymphocytes, with distinct regions displaying multiple lymphocyte aggregations. This pattern strongly suggested that this case was misclassified by experts. Conversely, (B), (C), and (D) illustrate cases with a mix of low- and high-density lymphocyte areas, as well as zones without lymphocytes. This variability was indicative of tumor heterogeneity, a well-documented caveat in glioma characterization. Tumor heterogeneity not only complicates expert annotations, but also hampers model training and predictions, as evidenced in Figure 8(A) and (B), where similar misclassifications occurred. Figure 8 shows all the wrongly predicted global score B cases or non-A cases. The density map generation strategy, when paired with a 2D CNN model, proved efficient to recover the case demonstrated in Figure 8(B). Our whole methodology involving XGBoost and 2D CNN models is still limited to addressing the inherent complexity and variability of the brain tumor microenvironments.

Prediction of global score B and D using subclusters

Table 4. Confusion matrix: Prediction of B versus D using patterns.

	B	D
B	8	2
D	0	2

Here, all non-A cases in the test dataset were considered for classification into groups B or D. The test dataset included 10 cases with global score B and 2 with global score D. The patterns issued from a given query case were compared and quantified. The query case was assigned to the global score group (B or D) with the highest number of common patterns with the reference group (Figure 9). Using this methodology, we successfully classified 8 out of 10 global score B, and we classified 2 out of 2 global score D correctly. Given the small size of our test dataset, even minor modifications can lead to significant changes in the accuracy measures, and this should be taken into careful consideration. We reached an accuracy of 83% based on the confusion matrix displayed in Table 4.

Table 5. Combined confusion matrix: Two step strategy predictions of A, B and D. Identical performance from both the strategies (Patch clustering and Density maps). In both cases, final accuracy: 0.81

	A	B	D
A	17	2	2
B	1	6	1
D	0	0	2

We also implemented a two-step classification strategy integrating machine learning (ML) models, XGBoost and a 2D Convolutional Neural Network (2D CNN) with spatial pattern analysis to predict global infiltration scores (A, B, or D). In this pipeline, if the first step (ML prediction) identified a sample as class A, the final prediction is directly assigned to A. In cases predicted as non-A, the second step refines the prediction by selecting the class (B or D) with the highest spatial pattern count (Figure 10). Two cases with equal counts for B and D were excluded from the final accuracy calculation due to classification ambiguity. Both ML approaches demonstrated identical performance, achieving a final prediction accuracy of 0.81 (Table 5), whether using the patch clustering or density map method. The density map approach offered greater scalability and practicality when applied to new cases. By contrast, the patch clustering method required assigning new patches to predefined clusters using distance metrics, which may not align accurately with the original clustering outcomes. In both evaluations, we observed multiple predicted cases. Tumor heterogeneity and the presence of small tumor zones posed significant challenges to ML-based classification, often necessitating expert review, as seen in ambiguous or potentially misclassified cases. Overall, our combined ML and expert-informed strategy provide a robust framework for tumor infiltration classification, with reliable performance despite intrinsic biological variability.

To further validate this approach, we conducted a 10-fold analysis involving separate training, validation, and test splits. For each split, the respective predicted nonA cases (either by XGBoost or 2D CNN models) were classified into B or D using the same methodology. Across all splits, we achieved an average classification accuracy of 0.77±0.07 for the XGBoost model while it was 0.74±0.09 for the 2D CNN model.

This study underlines the complementary strengths of XGBoost and 2D CNN models in analyzing lymphocyte (T cell) density and predicting global scores in glioma. While XGBoost achieved higher accuracy on training data (0.97) and better precision, recall, and F1 scores, it demonstrated signs of overfitting, generalizing less effectively to unseen data (test accuracy 0.82) compared to the 2D CNN (0.85). This performance gap likely stems from tumor heterogeneity, class imbalance, limited test data, and the under-representation of non-A cases. To address these issues, a weakly supervised pattern analysis approach was introduced, involving expert review of only 25 patches per cluster, allowing for scalable and minimally biased annotation. This revealed distinct spatial signatures: high-density lymphocyte clusters were enriched in group non-A, while low-density patterns characterized group A, reflecting biologically meaningful differences. Using the 50 most frequent spatial motifs per group, the method achieved 0.83 accuracy, correctly classifying 8 of 10 B cases and both D cases, with minimal pattern overlap between A and B/D. A two-step prediction strategy was then developed: if the model predicted A, the label remained A; otherwise, if non-A, the final label was decided based on the majority pattern count between B and D. Both the patch clustering and density map approaches reached a final accuracy of 0.81, with the latter proving more scalable. Importantly, the pattern-based method offers a robust alternative when conventional models struggle, and splitting the dataset into 10 folds, each including at least two D cases, appears more legitimate for fair evaluation. These findings highlight the value of hybrid approaches that integrate machine learning with spatial pattern analysis and expert insights to address the biological complexity of glioma.

Ongoing work focuses on incorporating genomic, cellular, and survival data to better understand prognostic differences, especially in relation to T cell aggregation. Future improvements will explore multi-scale and 3D spatial modeling, along with advanced architecture such as transformers and ensembles, to improve generalization, manage uncertainty, and enhance clinical applicability.

Ethics approval and consent to participate

We do have the clearance of the ethics committee. Brain tumor samples were obtained from the Onconeurotek collection of the Pitié-Salpêtrière Hospital. The protocol was approved by the IRB (Onconeurotek 2.0). Tumor specimens and clinicopathological information were collected with informed consent and approval by an Institutional Review Board (N° IDRCB: 2023-A02763-42, NCT06314607) in accordance with the national laws and the Declaration of Helsinki.

Consent for publication

All authors consent for the submission and the publication of this work.

Clinical trial (Not applicable)

Availability of data and material

The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.

Competing interests

M.T. reports research funding from Sanofi and Amgen; consulting or advisory role for Servier, Novocure, NH TherAguiX, Agios Pharmaceutical, Integragen, Resilience, GSK, and Taiho Oncology; honoraria from Ono, Servier, Novocure. F.B. discloses a next-of-kin employed by Bristol Myers-Squibb, service contracts with Owkin and Treefrog Therapeutics via the institution, and personal funding from Owkin for scientific presentation. A.A.N, J.L. and I.B. declare that they have no competing interests.

Funding

The project received financial support by the grants of INCa-DGOS-Inserm 12560, Site de Recherche Intégré sur le Cancer (SiRIC CURAMUS) (F.B., M.T). F.B. received fundings Emergence 2019 from Sorbonne Université and PL-BIO 2023 from the National Institute of Cancer (Inca, France).

A.A.N. was supported by a grant from INCa, Ligue contre le cancer, and Fondation ARC (grant PAIR TUMC21-001). M.T. is supported by Paris Brain Institute (PBI), Paris Brain Institute America (PBIA), FNAB, ARTC, INCa, Ligue contre le cancer, and Fondation ARC (grant PAIR TUMC21-001), Sorbonne Université, Cancéropôle Ile-de-France, Fondation de France, INSERM ATIP Avenir, Paris Ile de France Region, and SiRIC CURAMUS, which is funded by the Institut National du Cancer, the French Ministry of Health and Prevention and by ITMO Cancer of Aviesan within the framework of the 2021-2030 Cancer Control Strategy, on funds administered by Inserm (INCa-DGOS-INSERM-ITMO Cancer_18010).

Authors’ contribution

F.B. and J.L. worked for the image acquisition. J.L. and A.A.N. processed images using Visiopharm. A.A.N. preprocessed, produced machine learning & deep learning models, spatial pattern analysis and result analysis under the supervision of M.T., F.B. and I.B. M.T. and F.B. annotated all images to get the global scores. J.L. prepared Figure 1 and F.B. prepared Figure 4. F.B. provided all expert knowledge for the pattern annotation and pattern analysis. A.A.N. and I.B. wrote the main manuscript text. All authors reviewed the manuscript.

Acknowledgements

We are grateful to the technical staff of the Department of Neuropathology of the Pitié-Salpêtrière Hospital for immunostainings, to HISTOMICS platform and Lev Stimmer for the support to digitalize immunostaining and to run Visiopharm software.

Alrubaie, H. D., Aljobouri, H. K., & Aljobawi, Z. J. (2023). Efficient Feature Selection Using CNN, VGG16 and PCA for Breast Cancer Ultrasound Detection. Revue d’Intelligence Artificielle, 37(5), 1255‑1261.
Baheti, B., Innani, S., Nasrallah, M., & Bakas, S. (2024). Prognostic stratification of glioblastoma patients by unsupervised clustering of morphology patterns on whole slide images furthering our disease understanding. Frontiers in Neuroscience, 18, 1304191. https://doi.org/10.3389/fnins.2024.1304191
Bakasa, W., & Viriri, S. (2023). VGG16 Feature Extractor with Extreme Gradient Boost Classifier for Pancreas Cancer Prediction. Journal of Imaging, 9(7), 138.
Boylan, J., Byers, E., & Kelly, D. F. (2023). The Glioblastoma Landscape : Hallmarks of Disease, Therapeutic Resistance, and Treatment Opportunities. Medical Research Archives, 11(6), 10.18103/mra.v11i6.3994.
Chattopadhyay, A., Sarkar, A., Howlader, P., & Balasubramanian, V. N. (2018). Grad-CAM++ : Improved Visual Explanations for Deep Convolutional Networks. 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), 839‑847. https://doi.org/10.1109/WACV.2018.00097
Chen, C., Lu, M. Y., Williamson, D. F. K., Chen, T. Y., Schaumberg, A. J., & Mahmood, F. (2022). Fast and scalable search of whole-slide images via self-supervised deep learning. Nature Biomedical Engineering. https://doi.org/10.1038/s41551-022-00929-8
Chen, C.-L., Chen, C.-C., Yu, C.-Y., .., & Chen, C.-Y. (2021). An Annotation-Free Whole-Slide Training Approach to Pathological Classification of Lung Cancer Types Using Deep Learning. Nature Communications, 12(1), 1193.
Chen, W., Tan, X., Zhang, Q., .., & Jiang, H. (2024). A robust approach for multi-type classification of brain tumor using deep feature fusion. Frontiers in Neuroscience, 18.
El Sissy, C., Kirilovsky, A., Lagorce Pagès, C., Marliot, ... Torben Frøstrup, Jensen, L. H., Beets, G., Zeitoun, G., & Pagès, F. (2024). International Validation of the Immunoscore Biopsy in Patients With Rectal Cancer Managed by a Watch-and-Wait Strategy. Journal of Clinical Oncology, 42(1), 70‑80.
Hsu, W.-W., Guo, J.-M., Pei, L., Chiang, L.-A., Li, Y.-F., Hsiao, J.-C., Colen, R., & Liu, P. (2022). A weakly supervised deep learning-based method for glioma subtype classification using WSI and mpMRIs. Scientific Reports. https://doi.org/10.1038/s41598-022-09985-0
Khened, M., Kori, A., Rajkumar, H., Krishnamurthi, G., & Srinivasan, B. (2021). A Generalized Deep Learning Framework for Whole-Slide Image Segmentation and Analysis. Scientific Reports, 11(1).
Li, Z., Cong, Y., Chen, X., Qi, J., Sun, J., Yan, T., Yang, H., Liu, J., Lu, E., Wang, L., Li, J., Hu, H., Zhang, C., Yang, Q., Yao, J., Yao, P., Jiang, Q., Liu, W., Song, J., … Gao, X. (2023). Vision transformer-based weakly supervised histopathological image analysis of primary brain tumors. iScience, 26, 105872. https://doi.org/10.1016/j.isci.2022.105872
Mlecnik, B., Lugli, A., Bindea, G., Marliot, F., Bifulco, C., Lee, J.-K. J., Zlobec, I., Rau, T. T., Berger, M. D., Nagtegaal, I. D., Vink-Börger, E., Hartmann, ... Shashank J., Shukla, S. N., Wang, Y., Zhang, G., Kawakami, Y., Marincola, F. M., Ascierto, P. A., Fox, B. A., … Galon, J. (2023). Multicenter International Study of the Consensus Immunoscore for the Prediction of Relapse and Survival in Early-Stage Colon Cancer. Cancers, 15(2), 418.
Neto, P. C., Montezuma, D., Oliveira, I. M., .., & Cardoso, J. S. (2024). An Interpretable Machine Learning System for Colorectal Cancer Diagnosis from Pathology Slides. NPJ Precision Oncology, 8(1).
Pataki, B. Á., Olar, A., Ribli, P., .., & Csabai, I. (2022). HunCRC: Annotated Pathological Slides to Enhance Deep Learning Applications in Colorectal Cancer Screening. Scientific Data, 9(1).
Saeedi, S., Rezayi, S., Keshavarz, H., & R. Niakan Kalhori, S. (2023). MRI-based brain tumor detection using convolutional deep learning methods and chosen machine learning techniques. BMC Medical Informatics and Decision Making, 23(1).
Saltz, J., Gupta, R., Hou, L., Kurc, N. D., .., & Mariamidze, A. (2018). Spatial Organization and Molecular Correlation of Tumor-Infiltrating Lymphocytes Using Deep Learning on Pathology Images. Cell Reports, 23(1), 181-193.e7.
Schaff, L. R., & Mellinghoff, I. K. (2023). Glioblastoma and Other Primary Brain Malignancies in Adults : A Review. JAMA, 329(7), 574.
Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2020). Grad-CAM : Visual Explanations from Deep Networks via Gradient-based Localization. International Journal of Computer Vision, 128(2), 336‑359. https://doi.org/10.1007/s11263-019-01228-7
Sharma, P., Aaroe, A., Liang, J., & Puduvalli, V. K. (2023). Tumor Microenvironment in Glioblastoma : Current and Emerging Concepts. Neuro-Oncology Advances, 5(1), vdad009.
Srinivas, C., K. S., K., .., Partibane, B., & Awal, H. (2022). Deep Transfer Learning Approaches in Performance Analysis of Brain Tumor Classification Using MRI Images. Journal of Healthcare Engineering, 2022, 1‑17.
Stupp, R., Mason, J. G., .., Eisenhauer, E., & Mirimanoff, R. O. (2005). Radiotherapy plus Concomitant and Adjuvant Temozolomide for Glioblastoma. New England Journal of Medicine, 352(10), 987‑996.

No competing interests reported.

Supplementarytables.docx

Download PDF

Editorial decision: Revision requested
21 Dec, 2025
Reviews received at journal
10 Dec, 2025
Reviewers agreed at journal
08 Dec, 2025
Reviewers agreed at journal
05 Dec, 2025
Reviews received at journal
27 Nov, 2025
Reviewers agreed at journal
21 Nov, 2025
Reviewers agreed at journal
04 Oct, 2025
Reviewers invited by journal
30 Sep, 2025
Editor assigned by journal
25 Sep, 2025
Editor invited by journal
25 Sep, 2025
Submission checks completed at journal
24 Sep, 2025
First submitted to journal
24 Sep, 2025

You are reading this latest preprint version

Classification of spatial patterns of lymphocyte infiltration in gliomas from whole slide imaging

Status:

Version 1

Abstract

Figures

Introduction

Materials and Methods

Images of Brain Tumor Samples, Annotations and Segmentation

Patch-based analysis

Density Map Analysis

Spatial Analysis based on Labeled Patterns

Results

Comparable performance of XGBoost and of 2D CNN models: Classification of A vs nonA

Density emerges as the most influential variable across both models

Wrongly predicted cases

Prediction of global score B and D using subclusters

Discussion and Conclusion

Declarations

Ethics approval and consent to participate

Consent for publication

Clinical trial (Not applicable)

Availability of data and material

Competing interests

Funding

Authors’ contribution

Acknowledgements

References

Additional Declarations

Supplementary Files

Status:

Version 1