Study design and participants
This study was a part of the Pre-Twin Screen study funded by EP PerMed (project # JTC2019-61) to develop a model of multi-markers, personalized, prenatal diagnostics to predict feto-maternal complications in twin pregnancies19. Enrolment started in December 2020 and ended in August 2023. Women with two live monochorionic diamniotic (MCDA) and dichorionic (DC) twins at 11+0 - 13+6 weeks’ gestation, calculated from the crown-rump length (CRL) of the larger fetus20 were enrolled. The inclusion criteria were women delivering two live, non-malformed neonates >24 weeks’ gestation. These criteria were fulfilled by 596 women: 75 from Rome, Italy, 75 from Montreal, Canada, 93 from Barcelona, Spain, 99 from Tubingen, and 141 from Bonn, Germany, and 113 from Zerifin, Israel (Table 1).
The master study ethical approval was obtained by the Shamir (Assaf Harofe) Medical Center (Trial # 0043-20-ASF) and the Israel Ministry of Health (# 202016632). It was subsequently endorsed in all other participating centres. All participants provided written informed consent. The protocol was registered in Clinicaltrials.gov with an ID #: NCT04595214.
Investigations in the first trimester
At enrolment, we recorded maternal demographics, medical and pregnancy history, including maternal age, their BMI and ethnic origin, whether they had GDM in a previous pregnancy (for multipara participants), and their family history of GDM. We also entered features of the current pregnancy, including the mode of conception, chorionicity, among others12,14. Blood cell counts, blood glucose levels after overnight fasting, and blood groups were determined from blood samples. Ultrasound was used to determine the NT width19,21. Estimated fetal weight (EFW) was determined according to Hadlock et al.22 using the four-parameter formula for measuring the biparietal diameter (BD), head (HC) and abdominal (AC) circumference, and femur length (FL) at any of 11-13, 20-22, 24-26, 28-30, 32-34 and 36-37 weeks’ gestation from each twin, unless the pregnancy had been delivered earlier. In MCDA twin pregnancies, additional ultrasound scans were carried out at 15-16 and 17-18 weeks’ gestation. For this study, we used the values of 11-13, 20-22 and 32-33. Themean uterine arteries pulsatility index (UtA-PI) of the left and right uterine arteries was measured by transvaginal or transabdominal color Doppler ultrasound24. Mean arterial pressure (MAP) was evaluated by validated automated devices and a standardized protocol25.
We measured the serum level of pregnancy associated plasma protein A (PAPP-A) placental growth factor (PlGF), and soluble fms-like tyrosine kinase 1 (sFLT-1) by automated analyzers (Elecsys Analyzer, Roche Diagnostics International AG, Switzerland; Delfia Express, Revvity, Turku, Finland; or BRAHMS KRYPTOR compact PLUS, Thermo Fisher Scientific, Germany). Cell-free fetal DNA (cffDNA) fraction was determined as part of the examination of maternal blood to identify major trisomies26.
Investigations in the second and third trimesters
Except for CRL, NT, cffDNA fraction, and blood type, which were only determined in the first trimester, and GCT and OGTT, which are only measured once at gestational weeks 24-28, all values measured in the first trimester were also determined in the second and third trimesters. Ultrasound scans to identify malformations, blood cell counts, hemoglobin, blood biochemistry for glucose, iron, PlGF, sFLT-1, and PAPP-A were conducted in any of the 1st, 2nd, and 3rd trimesters.
Delivery Data
Delivery data were extracted from the electronic medical records of participating hospitals, or by hospital discharge, and women's phone interviews if delivery occurred outside the enrolling hospital. The outcome measure was delivery with GDM. Preterm delivery (PTD) was defined as any delivery before 37 weeks’ gestation27. Values entered covers the entire process and mode of delivery, any test taken during the admission to delivery, newborn details, and NICU data if required.
The Diagnosis of GDM
The diagnosis of GDM was conducted at 24-28 weeks of gestation according to the guidelines of the American College of Obstetrics and Gynecology28, although with some slight local variations. First GCT (50 g) was conducted, and if above 200 mg/dL, results were considered positive. If values were >140 but below 200 mg/dL, a secondary 100 g, OGTT was performed in the morning after overnight fasting. Women were considered positive if two out of four measurements were ≥ 95 (time zero), 180, 155, and 145 mg/dL, at the respective next 1, 2, or 3 hr’s. In Barcelona, they followed the National Diabetes Data Group criteria29 stipulate using fasting 105mg/dL at time zero and 190mg/dL, 165mg/dL, and 145mg/dL for 1,2, and 3 hrs, respectively. In patients where GCT or OGTT could not be accomplished, evaluation of blood glucose levels in the morning and 1 hour after each meal was performed, and if values were pathological, a diagnosis of GDM was made. Women with GDM were treated with nutritional intervention, metformin, and insulin as necessary. Following the diagnosis of GDM, centers used nutritional intervention, insulin, metformin, or their combinations to improve outcomes after diagnosis, hoping to prevent GDM. Clinical management was according to the 24-28 testing of GCT and OGTT (excluding chronic diabetes).
Machine learning and statistical methods
During the study, databases were shared with the data manager every month, and missing entries that were overlooked initially were subsequently completed from the source site records. As such, there were practically no missing data, and the few missing values were replaced by the median.
The data were converted into Z-scores, using the training set average and standard deviations. Categorical parameters were represented using one-hot encoding and were not normalized. For the prediction, we tested XGBoost30, logistic regression, and Light Gradient Boosting Machine (Lgbm)31. For the logistic regression, a ridge regularization was used with a coefficient of 1.0. For the XGBoost, 50 trees were used, with a max depth of 4, gamma=8, and eta=1/3. For the Lgbm, 50 trees were used, with a learning rate of 0.1, a bagging fraction of 0.7 for both samples and features, and a limitation of at least 20 samples per leaf. Given the limited size of the sample, no hyperparameter tuning was performed in the main text. A similar analysis with hyperparameter optimization was performed in the supplementary material.
We evaluated for each woman four groups of variables: 1) demographics, and medical and obstetric history collected at the time of enrolment, 2-4) marker values measured at each of the three pregnancy trimesters. The association between the different features in unaffected participants compared to GDM patients was performed using the p-value of the Point-Biserial Correlation Coefficient, and the correlation coefficient among the different features32.
In each trimester, and for each model, we used the cumulative information until this trimester. We divided the data 10 times randomly into 80% training and 20 % test. We computed the area under the curve (AUC) of the receiver operating characteristic (ROC) curve for each split. In parallel, the predictions for all the tests were combined to produce a single ROC for all continuous variables.
Continuous patients’ characteristics are presented as medians with interquartile range (IQR), and compared by the Mann-Whitney U-test or Kruskal–Wallis non-parametric test. Categorical values are presented as n (%) and were compared using the Chi-square test or Fisher’s exact U test.
All estimates and statistical tests were performed using MATLAB version 2024a (MathWorks Inc., Natick, MA, USA). Power analysis was calculated with WinPepi software Ver. 11.65 (http://www.brixtonhealth.com/pepi4windows.html). Note that the data needed for the prediction is only available upon request and following the appropriate ethics approvals.
The machine learning prediction accuracy was measured either through the aggregation of all folds and the computation of a ROC curve, and the resulting AUC on the combined data. We also computed the average AUC on all folds. The same was done for estimating marker efficacy. The results in the main text are without class stratification and without hyperparameter optimization. We have also tested the same model with class stratification and hyper-parameter optimization using Optuna.