3.1 Study Selection and Characteristics
The comprehensive database searches and supplementary strategies yielded a total of 8,547 records after removal of duplicates (Figure 1: PRISMA Flow Diagram). Title and abstract screening resulted in exclusion of 7,892 records that clearly did not meet eligibility criteria, leaving 655 records for full-text review. Inter-rater agreement for title/abstract screening was substantial (Cohen's κ = 0.86, 95% CI: 0.82-0.90). Full-text review was completed for 655 articles, of which 528 were excluded based on predetermined exclusion criteria. The most common reasons for exclusion at full-text stage were: studies not employing AI/ML predictive modelling approaches (n=198, 37.5%), studies not addressing African populations or contexts (n=147, 27.8%), studies focusing on disease diagnosis or clinical decision support rather than burden forecasting (n=104, 19.7%), studies addressing only communicable diseases (n=52, 9.8%), and duplicate publications or conference abstracts subsequently published as full articles (n=27, 5.1%). Ultimately, 127 studies met all inclusion criteria and were included in the final synthesis. Inter-rater agreement for full-text screening was excellent (Cohen's κ = 0.91, 95% CI: 0.87-0.94).
Systematic identification and selection of studies for inclusion in the scoping review. Database searches and supplementary sources yielded 8,997 records; 450 duplicates were removed. After title/abstract screening of 8,547 records, 655 underwent full-text review. Following application of eligibility criteria, 127 studies were included in the final synthesis. Inter-rater agreement: Cohen's κ=0.86 for title/abstract screening; κ=0.91 for full-text review.
The temporal distribution of publications revealed accelerating growth in research output over time. Only 8 studies (6.3%) were published between 2010 and 2015, compared to 31 studies (24.4%) between 2016 and 2020, and 88 studies (69.3%) between 2021 and October 2025. The year 2023 represented peak publication output with 34 studies (26.8% of total). This temporal trend reflects both the broader proliferation of AI/ML applications in health and the increasing recognition of NCDs as a priority health challenge in Africa.
3.2 Geographical Distribution
Substantial geographical concentration characterized the included studies, with marked overrepresentation of African countries and complete absence of research from others (Table 2). South Africa emerged as the most extensively studied setting, represented in 44 studies (34.6% of total). Kenya contributed 23 studies (18.1%), Nigeria 19 studies (15.0%), Egypt 12 studies (9.4%), and Ethiopia 11 studies (8.7%). The remaining 18 studies (14.2%) were distributed across 13 additional African countries: Ghana (5 studies), Uganda (3 studies), Tanzania (2 studies), Malawi (2 studies), Rwanda (1 study), Cameroon (1 study), Senegal (1 study), Zimbabwe (1 study), Botswana (1 study), Mauritius (1 study), Tunisia (1 study), Morocco (1 study), and Algeria (1 study).
Critically, 36 African countries (66.7% of the continent's 54 nations) were completely unrepresented in the identified literature. Geographical analysis revealed pronounced regional disparities. Eastern Africa contributed 38 studies (29.9%), Southern Africa 48 studies (37.8%), Western Africa 27 studies (21.3%), Northern Africa 14 studies (11.0%), and Central Africa was represented by only a single study from Cameroon (0.8%). Francophone and lusophone African nations were particularly underrepresented, with limited research from countries including the Democratic Republic of Congo, Mozambique, Angola, Burkina Faso, Mali, Niger, Chad, and others.
Multi-country collaborative studies were rare, representing only 11 studies (8.7% of total). These predominantly involved partnerships between South African and other Southern African Development Community (SADC) nations or collaborations between East African Community member states. Continental-scale studies encompassing multiple African regions were entirely absent from the identified literature.
Table 2: Distribution of Studies by African Country and Region
|
Region
|
Countries (Number of Studies)
|
Total Studies
|
% of Total
|
|
Southern Africa
|
South Africa (44), Botswana (1), Zimbabwe (1), Malawi (2), Mauritius (1)
|
48
|
37.8%
|
|
Eastern Africa
|
Kenya (23), Ethiopia (11), Uganda (3), Tanzania (2), Rwanda (1)
|
38
|
29.9%
|
|
Western Africa
|
Nigeria (19), Ghana (5), Senegal (1), Cameroon (1)
|
27
|
21.3%
|
|
Northern Africa
|
Egypt (12), Tunisia (1), Morocco (1), Algeria (1)
|
14
|
11.0%
|
|
Central Africa
|
Cameroon (1)
|
1
|
0.8%
|
Table 2. Geographic distribution of included studies across African regions and countries
Distribution of 127 included studies across five African regions (Southern, Eastern, Western, Northern, and Central Africa) and 18 individual countries. Study counts and percentages are provided for each country and region. South Africa (n=44, 34.6%), Kenya (n=23, 18.1%), and Nigeria (n=19, 15.0%) accounted for 67.7% of all studies, while 36 African countries (66.7%) had no published research on AI/ML for NCD forecasting.
3.3 Disease-Specific Focus
The distribution of studies across NCD categories demonstrated pronounced emphasis on particular disease groups with substantial gaps in others (Figure 2 - to be inserted). Cardiovascular diseases constituted the most frequently examined NCD category, addressed in 54 studies (42.5% of total). Within cardiovascular diseases, hypertension prediction was investigated in 31 studies, stroke forecasting in 15 studies, coronary artery disease in 12 studies, and heart failure in 9 studies (several studies examined multiple cardiovascular conditions).
Diabetes mellitus was examined in 49 studies (38.6%), representing the second most common disease focus. Of these, 38 studies focused on type 2 diabetes prediction, 8 studies addressed diabetes complications including nephropathy and retinopathy, and 3 studies examined gestational diabetes. Cancer represented the focus of 13 studies (10.2%), with breast cancer most frequently studied (5 studies), followed by cervical cancer (3 studies), colorectal cancer (2 studies), liver cancer (2 studies), and lung cancer (1 study).
Chronic respiratory diseases were substantially underrepresented, addressed in only 11 studies (8.7%). These included chronic obstructive pulmonary disease (6 studies) and asthma (5 studies). Other NCDs examined included chronic kidney disease (18 studies, 14.2%), mental health disorders (7 studies, 5.5%), liver disease (4 studies, 3.1%), and osteoarthritis (2 studies, 1.6%). Multiple studies addressed more than one NCD category simultaneously, examining co-morbidities or developing multi-disease prediction models.
3.4 AI/ML Methodological Approaches
Substantial diversity characterized the AI/ML algorithms and methodological approaches employed across included studies, though certain algorithms predominated (Table 3). Supervised machine learning approaches were utilized in 119 studies (93.7%), while unsupervised learning was employed in only 5 studies (3.9%) primarily for clustering patient subgroups, and hybrid approaches combining supervised and unsupervised methods were implemented in 3 studies (2.4%).
Among supervised learning algorithms, ensemble methods were most common, with random forest algorithms utilized in 40 studies (31.5% of all studies). Gradient boosting variants including XGBoost, LightGBM, and CatBoost were employed in 27 studies (21.3%). Support vector machines (SVM) were implemented in 31 studies (24.4%), while logistic regression as a baseline comparison model appeared in 28 studies (22.0%).
Deep learning architectures were employed in 24 studies (18.9%), with substantial variation in specific implementations. Feedforward neural networks were most common (11 studies), followed by convolutional neural networks (CNN) adapted for structured health data (7 studies), recurrent neural networks including Long Short-Term Memory (LSTM) networks for temporal sequence modelling (4 studies), and transformer architectures (2 studies). Naive Bayes classifiers were implemented in 15 studies (11.8%), decision trees in 19 studies (15.0%), and k-nearest neighbours in 12 studies (9.4%).
Notably, 76 studies (59.8%) employed ensemble or model stacking approaches, comparing performance across multiple algorithms to identify optimal models for specific contexts. However, only 41 studies (32.3%) provided detailed hyperparameter tuning procedures, and merely 34 studies (26.8%) employed systematic feature selection methodologies beyond clinical judgment.
Table 3: Distribution of AI/ML Algorithms Employed
|
Algorithm Category
|
Specific Methods
|
Number of Studies
|
% of Total Studies
|
|
Ensemble Methods
|
Random Forest
|
40
|
31.5%
|
|
|
Gradient Boosting (XGBoost, LightGBM, CatBoost)
|
27
|
21.3%
|
|
Support Vector Machines
|
Linear SVM, RBF SVM, Polynomial SVM
|
31
|
24.4%
|
|
Deep Learning
|
Feedforward Neural Networks
|
11
|
8.7%
|
|
|
Convolutional Neural Networks
|
7
|
5.5%
|
|
|
Recurrent Neural Networks/LSTM
|
4
|
3.1%
|
|
|
Transformer Models
|
2
|
1.6%
|
|
Decision Trees
|
Single Decision Trees, CART
|
19
|
15.0%
|
|
Naive Bayes
|
Gaussian NB, Multinomial NB
|
15
|
11.8%
|
|
K-Nearest Neighbors
|
-
|
12
|
9.4%
|
|
Logistic Regression
|
Used as baseline comparator
|
28
|
22.0%
|
Note: Total exceeds 100% as many studies employed multiple algorithms
Table 3. Frequency of artificial intelligence and machine learning algorithms used in included studies
Distribution of AI/ML algorithms employed across 127 studies, categorized by algorithm type and specific methods. Random forest (n=40, 31.5%) and support vector machines (n=31, 24.4%) were most common. Supervised learning approaches dominated (93.7% of studies). Note: Totals exceed 100% as many studies employed multiple algorithms for comparison or ensemble methods.
3.5 Temporal Dimensions of Prediction
A critical finding concerned the temporal characteristics of prediction models developed across included studies. The substantial majority of studies (87 studies, 68.5%) focused on cross-sectional risk prediction or classification, estimating current or near-term individual risk rather than forecasting future population-level disease burden. These models typically predicted disease presence, likelihood of developing disease within 1-2 years, or risk category classification based on current patient characteristics.
Only 30 studies (23.6%) incorporated explicit temporal forecasting methodologies suitable for projecting future disease burden at population levels over medium to long-term horizons. Among these, short-term forecasting (1-3 years) was most common (18 studies), medium-term forecasting (3-5 years) was conducted in 9 studies, and long-term forecasting (>5 years) was attempted in only 3 studies. Time-series analysis approaches including ARIMA models, exponential smoothing, and recurrent neural networks were employed in 22 studies for temporal forecasting. Compartmental epidemiological models integrated with machine learning were utilized in 5 studies to project disease transmission dynamics.
Ten studies (7.9%) focused exclusively on spatial rather than temporal prediction, mapping geographical variation in NCD burden without temporal forecasting components. These employed spatial analytics techniques including geographical information systems (GIS), spatial autocorrelation analysis, and geographically weighted regression to identify high-risk geographical areas.
3.6 Data Sources and Variables
Substantial heterogeneity characterized the data sources utilized for model development and validation across included studies (Table 4). Hospital-based electronic health records (EHRs) or clinical databases constituted the most frequent data source, employed in 65 studies (51.2%). These primarily originated from tertiary referral hospitals in urban centres, raising concerns regarding representativeness for broader populations. National or sub-national population health surveys including Demographic and Health Surveys (DHS), World Health Survey, and country-specific non-communicable disease risk factor surveys were utilized in 36 studies (28.3%).
Disease-specific registries including national cancer registries, diabetes registries, and cardiovascular disease surveillance systems provided data for 18 studies (14.2%). Longitudinal cohort studies contributed data to 14 studies (11.0%), while administrative claims data from health insurance systems were employed in 8 studies (6.3%). Integration of multiple data sources to enhance predictive accuracy was conducted in only 26 studies (20.5%), representing a significant missed opportunity given evidence that multi-source integration typically improves model performance.
The Global Burden of Disease Study data were utilized in 12 studies (9.4%) to extract mortality and disability-adjusted life-year estimates for model calibration or validation. However, primary data collection specifically for predictive modelling purposes was rare, conducted in only 9 studies (7.1%). Mobile health data collection platforms were employed in 5 studies (3.9%), representing emerging opportunities for real-time data acquisition.
Table 4: Data Sources Utilized for Model Development
|
Data Source Category
|
Number of Studies
|
% of Total
|
|
Hospital Electronic Health Records
|
65
|
51.2%
|
|
National/Sub-national Population Surveys
|
36
|
28.3%
|
|
Disease-Specific Registries
|
18
|
14.2%
|
|
Longitudinal Cohort Studies
|
14
|
11.0%
|
|
Administrative Claims Data
|
8
|
6.3%
|
|
Multiple Integrated Data Sources
|
26
|
20.5%
|
|
Global Burden of Disease Study
|
12
|
9.4%
|
|
Primary Data Collection
|
9
|
7.1%
|
|
Mobile Health Platforms
|
5
|
3.9%
|
Table 4. Primary data sources used for AI/ML model development and validation
Types and frequency of data sources utilized in 127 included studies. Hospital-based electronic health records (n=65, 51.2%) were the predominant data source, followed by population-based surveys (n=36, 28.3%). Only 26 studies (20.5%) integrated multiple data sources. The heavy reliance on hospital-based data raises concerns regarding population representativeness and generalizability to broader African populations.
Regarding predictor variables incorporated into models, demographic factors (age, sex, geographical location) were included in all 127 studies (100%). Clinical variables including blood pressure, body mass index, and laboratory parameters were incorporated in 103 studies (81.1%). Behavioural risk factors including smoking, alcohol consumption, physical activity, and dietary patterns were included in 78 studies (61.4%). Socioeconomic variables including education, income, occupation, and household wealth indices were incorporated in 62 studies (48.8%). Environmental and contextual factors including air quality, urbanization, and healthcare access were included in 34 studies (26.8%). Genetic or family history variables were incorporated in 19 studies (15.0%).
Data quality assessment and preprocessing procedures were explicitly reported in 89 studies (70.1%). Missing data handling strategies varied considerably: complete case analysis (excluding records with missing values) was employed in 42 studies (33.1%), multiple imputation in 28 studies (22.0%), single imputation methods in 15 studies (11.8%), and model-based imputation using machine learning algorithms in 4 studies (3.1%). Concerningly, 38 studies (29.9%) did not explicitly report missing data handling procedures, and 23 studies (18.1%) reported minimal missing data (<5%) without describing handling methods.
3.7 Model Performance and Validation
Model validation strategies and performance assessment varied substantially across included studies, with concerning limitations in external validation (Table 5). Internal validation through data splitting approaches (training and testing sets) was conducted in 114 studies (89.8%), with typical training-testing splits ranging from 70:30 to 80:20. Cross-validation techniques including k-fold cross-validation (typically k=5 or k=10) were employed in 73 studies (57.5%) to enhance reliability of performance estimates.
However, external validation—assessment of model performance in datasets independent from model development, was conducted in only 23 studies (18.1%). Among these, temporal external validation using data from different time periods was performed in 14 studies, geographical external validation using data from different healthcare facilities or regions was conducted in 11 studies, and both temporal and geographical external validation was implemented in only 2 studies. The limited external validation severely restricts conclusions regarding model generalizability across diverse African contexts.
Table 5: Validation Strategies Employed
|
Validation Strategy
|
Number of Studies
|
% of Total
|
|
Internal Validation (Train-Test Split)
|
114
|
89.8%
|
|
Cross-Validation (k-fold)
|
73
|
57.5%
|
|
External Validation (Any Type)
|
23
|
18.1%
|
|
- Temporal External Validation
|
14
|
11.0%
|
|
- Geographical External Validation
|
11
|
8.7%
|
|
- Both Temporal and Geographical
|
2
|
1.6%
|
|
Bootstrap Validation
|
18
|
14.2%
|
|
No Explicit Validation Reported
|
13
|
10.2%
|
Table 5. Model validation strategies reported in included studies
Validation approaches used to assess predictive model performance in 127 studies. Internal validation through train-test splitting (n=114, 89.8%) and cross-validation (n=73, 57.5%) were common. However, external validation was rare (n=23, 18.1%), with only 14 studies (11.0%) conducting temporal validation and 11 studies (8.7%) conducting geographical validation. Limited external validation substantially constrains conclusions regarding model generalizability across diverse African contexts.
Performance metrics reported demonstrated substantial diversity, complicating cross-study comparisons. Area under the receiver operating characteristic curve (AUC-ROC) was the most frequently reported metric, presented in 98 studies (77.2%), with values ranging from 0.72 to 0.98 (median 0.86, interquartile range 0.81-0.91). Accuracy was reported in 86 studies (67.7%), ranging from 68% to 97% (median 84%). Sensitivity was reported in 79 studies (62.2%), specificity in 76 studies (59.8%), positive predictive value in 58 studies (45.7%), and negative predictive value in 54 studies (42.5%).
More sophisticated performance metrics were less commonly reported. F1-score, which balances precision and recall, was reported in 41 studies (32.3%). Calibration assessment, crucial for evaluating whether predicted probabilities correspond to observed outcomes, was conducted in only 28 studies (22.0%) through Hosmer-Lemeshow tests or calibration plots. Precision-recall curves, particularly important for imbalanced datasets common in disease prediction, were presented in 19 studies (15.0%).
Model interpretability and explainability approaches were addressed in 47 studies (37.0%). Feature importance ranking was provided in 38 studies (29.9%), with methods including random forest variable importance, permutation importance, or coefficient magnitudes. Shapley Additive Explanations (SHAP) values for individual prediction explanation were employed in 11 studies (8.7%). Partial dependence plots illustrating relationships between individual features and predictions were presented in 8 studies (6.3%). Concerningly, 80 studies (63.0%) employed complex "black box" algorithms without providing interpretability analysis, limiting clinical utility and trust.
Comparative assessment against traditional statistical models or existing clinical risk scores was conducted in 67 studies (52.8%). Machine learning approaches generally demonstrated superior performance compared to logistic regression models, with AUC-ROC improvements ranging from 0.03 to 0.17 (median improvement 0.08). Comparisons with established clinical risk scores (e.g., Framingham Risk Score for cardiovascular disease, FINDRISC for diabetes) showed ML models outperformed traditional scores in 42 of 48 comparisons (87.5%).
3.8 Implementation and Real-World Deployment
A pronounced gap existed between model development and real-world implementation. Among the 127 included studies, the vast majority (104 studies, 81.9%) described models that remained at the development or validation stage without deployment in clinical or public health settings. Only 23 studies (18.1%) reported pilot implementation or deployment activities. Of these, 15 studies described integration into electronic health record systems for clinical decision support, 5 studies reported deployment in mobile health applications for community-level screening, and 3 studies described implementation in public health surveillance systems for population monitoring.
Barriers to implementation were explicitly discussed in 34 studies (26.8%). Commonly cited barriers included: insufficient digital infrastructure and unreliable internet connectivity (mentioned in 23 studies), limited interoperability between health information systems (19 studies), inadequate training and digital literacy among healthcare providers (17 studies), concerns regarding data privacy and security (14 studies), lack of regulatory frameworks for AI in healthcare (12 studies), absence of sustainable funding mechanisms (11 studies), and cultural resistance or mistrust toward AI technologies (7 studies).
Only 11 studies (8.7%) conducted any form of economic evaluation, including cost-effectiveness analysis or budget impact assessment. Among these, 8 studies reported that AI-driven prediction and early intervention strategies were potentially cost-effective compared to standard care, with incremental cost-effectiveness ratios below country-specific willingness-to-pay thresholds. However, the limited economic evidence constrains policy decision-making regarding resource allocation for AI implementation.
Equity considerations were explicitly addressed in 18 studies (14.2%). These examined potential disparities in model performance across population subgroups defined by socioeconomic status, geographical location (urban versus rural), sex, or ethnicity. Notably, 7 studies identified significant performance variation across subgroups, with models generally performing better for urban, higher socioeconomic status populations, potentially exacerbating existing health inequities. Algorithmic fairness assessment and bias mitigation strategies were employed in only 4 studies (3.1%).
Ethical considerations beyond equity, including informed consent for data use, potential harms from false predictions, and governance mechanisms, were discussed in 22 studies (17.3%). Data privacy protection approaches varied, with 47 studies (37.0%) describing de-identification procedures, 18 studies (14.2%) implementing differential privacy techniques, and 8 studies (6.3%) utilizing federated learning approaches enabling model training without centralized data aggregation.
3.9 Methodological Quality Assessment
Quality assessment using the adapted PROBAST tool revealed moderate to high risk of bias in substantial proportions of included studies (Table 6). In the participants domain, 34 studies (26.8%) were rated as having high risk of bias, primarily due to highly selected populations (e.g., single tertiary hospital without population representativeness) or inadequate description of source populations. The predictors domain showed high risk of bias in 29 studies (22.8%), most commonly due to predictor measurement inconsistencies or lack of standardization across data sources.
The outcome domain demonstrated high risk of bias in 41 studies (32.3%), predominantly due to unclear outcome definitions, variable outcome ascertainment methods, or insufficient follow-up duration for temporal forecasting studies. The analysis domain exhibited the highest proportion of high risk of bias, with 56 studies (44.1%) rated as high risk, most frequently due to inadequate sample sizes, lack of appropriate validation, or failure to account for overfitting through regularization or cross-validation.
Overall, considering the highest risk domain for each study, 63 studies (49.6%) were rated as having high risk of bias, 49 studies (38.6%) moderate risk, and only 15 studies (11.8%) low risk of bias. These findings underscore the need for enhanced methodological rigor in future predictive modelling research for NCD forecasting in Africa.
Table 6: Risk of Bias Assessment Using Adapted PROBAST
|
PROBAST Domain
|
Low Risk
|
Moderate Risk
|
High Risk
|
|
Participants
|
58 (45.7%)
|
35 (27.6%)
|
34 (26.8%)
|
|
Predictors
|
72 (56.7%)
|
26 (20.5%)
|
29 (22.8%)
|
|
Outcome
|
53 (41.7%)
|
33 (26.0%)
|
41 (32.3%)
|
|
Analysis
|
38 (29.9%)
|
33 (26.0%)
|
56 (44.1%)
|
|
Overall Rating
|
15 (11.8%)
|
49 (38.6%)
|
63 (49.6%)
|
Table 6. Risk of bias assessment across four PROBAST domains
Summary of methodological quality assessment for 127 included studies using the adapted Prediction model Risk Of Bias Assessment Tool (PROBAST). Studies were rated as low, moderate, or high risk of bias across four domains: participants, predictors, outcome, and analysis. Overall ratings reflect the highest risk domain for each study. Only 15 studies (11.8%) were rated low risk overall, while 63 studies (49.6%) had high risk of bias, primarily due to limitations in the analysis domain including inadequate sample sizes, lack of validation, and failure to address overfitting.
3.10 Collaborative Networks and Research Capacity
Analysis of author affiliations and collaborative patterns revealed concerning concentration of research capacity in particular institutions and limited South-South collaboration. Among the 127 included studies, primary authorship was affiliated with institutions in high-income countries (primarily United States, United Kingdom, and Europe) in 48 studies (37.8%), despite the studies focusing on African populations and data. African-led research (first and senior authors from African institutions) represented 79 studies (62.2%).
Institutional analysis identified several high-output research centres: University of Cape Town and associated South African institutions contributed to 22 studies (17.3%), University of Nairobi and Kenyan research institutions to 14 studies (11.0%), University of Ibadan and Nigerian institutions to 11 studies (8.7%), and University of the Witwatersrand to 9 studies (7.1%). Concentration of research capacity in these institutions highlights both centres of excellence and the need for capacity strengthening across broader African research networks.
North-South collaborations (partnerships between African and high-income country institutions) were present in 67 studies (52.8%). However, South-South collaborations involving researchers from multiple African countries were relatively rare, identified in only 18 studies (14.2%). Continental research networks or multi-country African consortia were absent from the identified literature, representing a significant missed opportunity for knowledge sharing, resource optimization, and harmonization of approaches.
Funding sources were reported in 94 studies (74.0%). International development agencies and research funding organizations (including Welcome Trust, NIH, European Commission) funded 42 studies (33.1%), African government research councils funded 28 studies (22.0%), institutional funds supported 18 studies (14.2%), and private sector or philanthropic organizations funded 6 studies (4.7%). Notably, 33 studies (26.0%) did not report funding sources, raising transparency concerns.
3.11 Identified Knowledge Gaps
Systematic analysis of the evidence landscape revealed several critical knowledge gaps requiring prioritization in future research efforts:
Geographical Gaps:
- Complete absence of research from 36 African countries (66.7%), particularly in Central Africa, francophone West Africa, and lusophone countries
- Limited multi-country or continental-scale studies enabling comparison and generalization
- Insufficient representation of rural and remote populations in model development datasets
- Lack of research addressing unique contexts of conflict-affected and fragile states
Disease-Specific Gaps:
- Substantial underrepresentation of chronic respiratory diseases (8.7% of studies) despite significant burden
- Limited focus on specific cancer types beyond breast and cervical cancer
- Insufficient attention to mental health disorders (5.5% of studies) despite rising burden
- Neglect of multi-morbidity prediction, despite co-occurrence of multiple NCDs being common
- Limited research on less visible NCDs including chronic kidney disease, liver disease, and rheumatological conditions
Methodological Gaps:
- Predominance of cross-sectional risk prediction over longitudinal burden forecasting suitable for health system planning
- Limited application of advanced time-series forecasting methods and deep learning architectures
- Insufficient external validation (only 18.1% of studies), severely limiting generalizability conclusions
- Rare use of causal inference approaches to distinguish prediction from causation
- Limited ensemble methods combining diverse data sources and algorithms for enhanced accuracy
Data Infrastructure Gaps:
- Heavy reliance on hospital-based data sources with questionable population representativeness
- Insufficient integration of multiple data sources (only 20.5% of studies)
- Limited utilization of novel data streams including mobile health, wearable devices, and environmental sensors
- Inadequate data quality assessment and standardization procedures
- Absence of continent-wide data sharing platforms or federated learning infrastructures
Implementation Science Gaps:
- Pronounced gap between model development and real-world deployment (only 18.1% pilot implementation)
- Limited evidence on implementation barriers, facilitators, and strategies
- Insufficient economic evaluation to inform policy resource allocation decisions
- Inadequate assessment of equity implications and algorithmic fairness
- Limited engagement with end-users including policymakers, healthcare providers, and patients in model development
Validation and Interpretability Gaps:
- Rare temporal validation assessing model performance over time as epidemiology evolves
- Limited geographical external validation across diverse African contexts
- Insufficient model interpretability and explainability for clinical adoption
- Inadequate calibration assessment beyond discrimination metrics
- Limited comparison with existing clinical risk scores and traditional forecasting approaches