Leprosy (Hansen’s disease), a neglected tropical disease (NTD), remains a global public health concern. In 2024, 172,717 new leprosy cases were reported from 188 countries and territories, with the highest burden in low- and middle-income regions.1 Diagnosis is often complex due to the wide clinical spectrum, which ranges from subtle hypochromic macules in paucibacillary forms to diffuse nodular or infiltrative lesions in multibacillary presentations. Reactional states—immune-mediated inflammatory episodes that may occur before, during, or after treatment—add further diagnostic complexity and contribute to misclassification and delayed management.2
Limited specialist availability in endemic regions contributes to substantial diagnostic delays, particularly in remote or resource-constrained settings where frontline providers frequently have limited training in recognising early or atypical manifestations of leprosy.3,4 These structural gaps are often compounded by persistent stigma, fragmented referral pathways, and operational constraints within primary health services, all of which hinder timely case identification. Consequently, many patients are diagnosed only after the onset of nerve damage and visible deformities, increasing the risk of long-term disability and perpetuating transmission in affected communities.5
Advances in artificial intelligence (AI) have created opportunities to support the diagnosis of dermatological conditions and NTDs.6,7 Techniques such as convolutional neural networks, transfer learning,8,9 and, more recently, multimodal foundation models integrating large language models with Vision Transformers,10 have substantially expanded the capacity to analyse skin lesion images at scale. These systems have demonstrated high concordance with dermatologists in experimental settings and, in some narrowly defined diagnostic tasks, have even surpassed human performance.11–13
The first AI-based approach for leprosy diagnosis demonstrated the feasibility of lesion recognition through image analysis, achieving an accuracy of 91.6%.14 A subsequent study from Brazil reinforced the potential of AI by combining skin images with demographic and clinical data, reaching 90% accuracy and highlighting the added value of multimodal inputs for more robust classification.15 A pilot study in Côte d’Ivoire and Ghana extended these findings to a broader spectrum of skin NTDs, applying deep learning to Buruli ulcer, leprosy, mycetoma, scabies, and yaws, reporting over 70% agreement with clinical diagnoses.16 Despite these encouraging results, most existing studies are constrained by relatively small samples and limited image diversity, underscoring the need for external evaluation in larger, more representative datasets.
In parallel, digital health tools, particularly those incorporating AI, have emerged as a promising strategy to mitigate diagnostic gaps in NTDs. To address case management challenges in frontline settings, the World Health Organization (WHO) developed the Skin NTDs app, a mobile tool that combines offline clinical algorithms with structured learning resources for health workers. The app covers twelve skin NTDs—leprosy, Buruli ulcer, yaws, cutaneous leishmaniasis, post-kala-azar dermal leishmaniasis, onchocerciasis, lymphatic filariasis, mycetoma, chromoblastomycosis, sporotrichosis, scabies, and tungiasis—as well as 24 common skin conditions.17 In evaluations conducted in Ghana and Kenya, the app achieved a mean quality score of 4.02 (standard deviation 0.47) out of 5, supporting its perceived usefulness as a training and decision-support tool.18,19
A beta version of the WHO Skin NTDs app incorporates two cascaded AI-based visual classifiers, developed in collaboration with UniversalDoctor and Belle.ai.20 Both models are built on convolutional neural network architectures. The first classifier acts as a broad skin-condition screener, assigning probabilities across 24 common dermatological conditions and indicating when a lesion may warrant further evaluation for a skin-related neglected tropical disease (skin NTD). When a potential NTD is suspected, a second classifier is applied; this NTD-focused model assigns probabilities across the twelve skin NTD categories listed above and returns a ranked list of four suggested NTD diagnoses for each uploaded image. Together, these AI components enable the app to function as both a clinical decision-support tool and a scalable training platform for non-specialist health workers, who can capture and upload anonymised images of skin lesions to obtain a shortlist of possible diagnoses to be integrated with clinical history and examination findings.7,21
Although the app displays four diagnostic suggestions in routine use, the underlying classifier generates five ranked outputs, which are accessible in the DHIS2 evaluation interface and were used for analytical validation in this study. All analyses in this study refer exclusively to the NTD classifier (second-stage model).
To our knowledge, no previous peer-reviewed study has reported an external analytical validation of the AI-based NTD classifier integrated into the WHO Skin NTDs app. Using an independent dataset of clinically confirmed leprosy cases and their reactional forms, we assessed the classifier’s ability to identify leprosy among the twelve NTD output labels—focusing on Top-5 sensitivity and error patterns—and evaluated its potential utility as a digital health intervention supporting global efforts to control leprosy.