Study Population
A feasibility study on a subset of 200 individuals selected from the Paris Prospective Study III (PPS3) cohort was conducted. A targeted sampling approach was implemented, oversampling individuals with Type 2 Diabetes (T2D) by up to 20% to capture more extreme phenotypes associated with the condition. Additionally, the presence of carotid plaques in approximately 20% of the sample was enforced, ensuring representation of vascular complications relevant to T2D. To ensure a comprehensive range of blood pressure values, the population was divided into thirds: one-third within the 25th to 75th percentile range (120-141 mmHg), one-third below the 25th percentile (<120 mmHg), and one-third above the 75th percentile (≥141 mmHg). Furthermore, individuals aged between 50 to 75 years were uniformly included, ensuring a consistent distribution across the age range.
PPS3 is an ongoing community based prospective observational study conducted in Paris, France(32). The study protocol was approved by the Ethics Committee of Cochin Hospital (Paris, France) and was registered on the World Health Organization International Clinical Trials Registry platform (NCT00741728) on 08/25/2008. A total of 10,157 men and women aged 50–75 years were enrolled, who underwent a comprehensive preventive medical check-up, after signing an informed consent form. The vascular US was performed using Esaote PICUS Machine, Genova, Italy (128 RF linear array transducer with 7.5MHz). The raw Radio-frequency data were preserved to facilitate in-depth analysis. The inclusion criteria required the visibility of the intima-blood interface in at least some part of the far wall of the right common carotid artery, in a clear reconstructed B-mode image. Further details are available in the publication by PPS3 study group et al(32).
Ultrasound Data Processing
First, we developed a graphical user interface (GUI) using MATLAB software (MathWorks, Inc., Massachusetts, USA, version 2022b) to process raw radiofrequency signals and to reconstruct and process B-mode images. Additionally, we identified 178 Radiomics features (see description below) to be calculated from the selected region of interest (ROI).
Building on previous work (33), RF signals were transformed into B-mode ultrasound images using standard techniques.
Radiomic Features
A total of 74 radiomic B-mode features and 104 radiomic radio-frequency (RF) features were evaluated with the GUI. The B-mode features encompassed 1) First-Order Statistics(34) 2) Higher Order Textural Features (35–37)3) transform-based wavelet Features(38) 4) Fractal analysis features(39,40). Similarly, the RF features comprised(41) 1) Time series Features computed individually for each RF time series within the Region of Interest (ROI), with the mean value computed on 30 frames to derive time domain characteristics(42). 2) Frequency Domain Features involved Fourier transform to acquire the frequency spectrum, followed by straight-line fitting on the normalized spectrum(43,44). Furthermore, 3) Nakagami Distribution was utilized to extract the M parameter from the Nakagami distribution mean diagram (NDM) parametric map(43,45). 4) Spectral features(44). 5) Feature maps such as Direct energy attenuation diagram (DEA) and RF signal skewness intensity diagram (RF-I) were calculated and the texture analysis was applied to extract First-Order Statistics and Higher Order Textural Features from each map (43,46).
Data extraction settings
180 B-mode images (frames) were obtained from every original 6-second acquisition (a 128 radiofrequency lines multiarray with a depth of 4 cm captured at 30 frames per second). The Region of Interest was manually selected from the B-mode image capturing Intima Media complex on the far wall of the right carotid artery using a rectangular bounding box. Three end-diastolic frames from each patient were selected. For each frame, four ROI sizes (1 mm, 1.2 mm, 1.4 mm, and 1.6 mm) were extracted from the same location, with the Bounding Box centered on the smoothest section of the far wall to ensure optimal visualization of the Intima-blood interface (as depicted in Fig. 1). The Bounding Box encompassed the blood intima interface with minimal blood lumen on one side and the adventitia on the other. Initially, the bounding box was set at 1mm, gradually expanding by 0.2 mm towards the adventitia side while maintaining its position, in order to obtain the four different ROI sizes. Additionally, a fifth ROI size, termed the Variable ROI was introduced, which is the most suitable size among the four, precisely covering the Intima-Media (IM) complex (visually selected). Once extracted with the GUI, the features were normalized before performing feature selection.
Statistical analysis and Feature engineering
Descriptive statistics for population variables are presented as mean ± standard deviation (SD) or as counts (n) and percentages (%). First, we evaluated the feature stability across the three frames of the same clip and 5 ROI sizes of each frame by applying Intraclass Correlation (ICC) analysis with threshold of ICC > 0.50. We applied a two-way mixed effects model to calculate absolute agreement, treating ROI sizes as fixed effects and individuals as random effects (44). Second, we investigated the reproducibility across the three frames of the subset features with ICC > 0.50, for the prediction of chronological age (a proxy of vascular ageing), by applying least absolute shrinkage and selection operator (Lasso - L1 regularization) regression for feature selection (47). The following metrics were compared: model performance for age prediction by mean square error (MSE) and R2, and number and type of selected features. Those metrics were calculated from four datasets: the three containing the features extracted by three selected frames and one containing their median values, using the variable ROI size. Internal validation was tested by 80/20 split sample technique.
Thirdly, the impact of variation in ROI size on model performance and selected features for chronological age prediction was also investigated by Lasso L1 regression. The following metrics were compared: model performance for age prediction by mean square error (MSE) and R2, and number and type of selected features. Those metrics were calculated from the 5 datasets containing the median value of each feature for the three frames for 5 ROI sizes (1.0 mm, 1.2 mm, 1.4 mm, 1.6 mm and Var ROI). The internal validation was tested by 80/20 split sample technique. Additionally, we performed sensitivity analyses by applying Minimum Redundancy Maximum Relevance (MRMR) and Stepwise feature selection methods instead of Lasso L1, to validate the stability and reproducibility of the results with other feature selection techniques (see workflow diagram figure 2). The analyses were carried out using RStudio version 2023.9.0.463 (Boston, MA), utilizing glmnet, mlr ,caret , dplyr, mRMRe, e1071 and tidyverse packages.