Study Area
The study area encompasses South America, a continent with remarkable climatic and topographic diversity, ranging from humid tropical forests to arid and savanna regions (Delgado et al. 2022). This environmental heterogeneity directly influences species distribution, creating suitable conditions for historical processes of speciation, isolation, and endemism (Wei et al. 2025). Moreover, South America was strongly affected by Quaternary climatic oscillations, whose effects shaped present biodiversity, particularly in ecologically significant ecosystems such as the Amazon (Silva et al. 2018).
The Amazon, the largest tropical rainforest on the planet, covers approximately 5 million km², of which about 60% lies within Brazilian territory, distributed across nine states and 553 municipalities (Fearnside 1995). It is characterized byan extensive hydrographic network formed bymajor rivers thatdefine interfluves and, consequently, centers of endemism (Silva et al. 2005). The biome is recognized for its megadiversity but is currentlysubject to intense anthropogenic pressure. It is estimated that more than 20% of its area has already been deforested, mainly in the so-called “arc of deforestation,” a region marked by agricultural expansion, cattle ranching, mining, and infrastructure development (Almeida et al. 2010; Moraes 2019). Forest degradation, combined with the effects of climate change, threatens not only local biodiversity but also global climate balance, making the Amazon a strategic setting for species distribution modeling studies.
Within this context, the Belém center of endemism stands out, located in eastern Pará (near the Tocantins River) and western Maranhão (near the Pindaré River). It is the smallest center of endemism in the Amazon, with an area of approximately 243,000 km², and is considered one of the most threatened. More than 76% of its forest cover is already under severe deforestation pressure (Almeida et al. 2010; Moraes 2019). This center of endemism is particularly relevant because it harbors the species M. williamsoni (Garrison 2006), a damselfly specialized in primary forest environments. The occurrence of this species in the region reflects not only historical isolation processes but also its current vulnerability to anthropogenic pressures, justifying its selection as a model for evaluating historical, current, and future distribution patterns.
Occurrence records
The damselfly Mnesarete williamsoni (Garrison 2006) was selected as the target species due to its strong ecological association with riparian vegetation in primary forest streams (Carvalho et al. 2018; Carvalho 2019). This specialist behavior makes it highly sensitive to environmental changes, particularly those related to climate variability and land-use conversion, such as deforestation and forest degradation (Carvalho et al. 2018; Carvalho 2019). As a result, M. williamsoni is considered an effective bioindicator of preserved ecosystems, serving as a reliable predictor of environmental changes in both climatic and landscape contexts (Sagova-Mareckova et al. 2020). In addition, it is a taxonomically stable species, with no synonymy conflicts, and it is relatively easy to sample and identify in the field (Carvalho et al. 2018; Carvalho 2019), which enhances its applicability in ecological modeling.
To build the occurrence dataset, we corroborated the taxonomic identity of the target species. We compiled presence records from personal databases, complemented by secondary data obtained from the Global Biodiversity Information Facility (GBIF.org, 2022: https://gbif.org). All available presence records of M. williamsoni were retrieved without filtering by collection date, using the download link provided by the GBIF platform (DOI:10.5962/p.210560).
We applied a geographical mask encompassing the entire South American continent, and using QGIS (version 3.14; Quantum GIS Geographic Information System; QGIS.org 2020) and the R platform (R Core Team 2020), we filtered thepresence-only occurrence dataset Records lacking geographic coordinates, duplicates, erroneous entries, or those located outside the study area were excluded. To minimize potential bias caused by spatial autocorrelation (SAC), we applied a thinning procedure, retaining only one occurrence per grid cell. Finally, to pre-assess the spatial patterns of the species distribution, we used the Epanechnikov kernel density function (KDE) through the QGIS “Heatmap” extension, with a bandwidth of 200,000 km.
Environmental predictor variables
From the WorldClim platform (https://worldclim.org: Hijmans et al. 2005), we extracted 19 climate variables that allowed us to represent current climate dynamics (CURRENT: 1960–1990). For past and future climate scenarios, we selected global circulation models (CCSM4, MPI-ESM-P, and MIROC-ESM) to characterize past (LGM: ~22,000 years) and future (FUTURE: 2041–2060) climates scenarios. We selected these three global circulation models for their robustness and compatibility for the LGM and for the future, allowing for fidelity and correspondence when comparing predictions. For future climate projections, we selected the most extreme scenario for continued greenhouse gas emissions (Representative Carbon Pathway, RCP = 8.5). RCP 8.5 was adopted by the Intergovernmental Panel on Climate Change in 2004 and represents the most pessimistic and likely scenario for projections to the year 2100 (Frick et al. 2020; Schwalm et al. 2020; Ritchie et al. 2021). We selected the predictor variables for all climate scenarios with a spatial resolution of 5 arc minutes (~ 10 km at the equator), in order not to compromise data processing capacity and to ensure a correct representation of regional climate dynamics (Barve et al., 2011).
To avoid unequal weights between the predictor variables, from the R platform and using the 'maps' (Becker and Wilks 1993), 'rgdal' (Keitt 2010) and 'raster' (Hijmans 2015) packages, we standardized the predictor variables (z ~ Normal distribution: zero mean and standard deviation of |1|). We delimited the modelling area with a South American geographic extension to incorporate all potential climatic zones for the presence and distribution of the target species (see M Barve et al. 2011). From a cut mask with a South American extension and using the R platform, we extracted all predictor variables using the 'raster' and 'SDMTools' packages (Vanderwal et al., 2014). Finally, we performed a principal component analysis (PCA) to reduce the collinear effects of the predictor variables (De Marco and Nóbrega 2018). We chose only the first eight axes that explained more than 95% of the original environmental variation of the predictor variables considered.
Processing and evaluation of models
We applied an ensemble model approach, where the combination of different algorithms improves the approximation to the maximum and minimum thresholds of tolerance and the distribution of species, allowing better approximations to their niches (Allouche et al. 2006; Barve et al. 2011; Grenouillet et al. 2011; Alvarez et al. 2021). From the configuration of the 'ENMTML' package (from Andrade et al. 2020: https://github.com/andrefaa/ENMTML) and based on the performance of the algorithms and the flexibility with respect to input data (Elith et al. 2006), we selected the algorithms Maxent (Maximum Entropy - MXS: Phillips et al. 2006), Random Forest (RF: James et al. 2013) and Support Vector Machine (SVM: Salcedo-Sanz et al. 2014). We defined the pseudo-absences as environmentally random, spatially non-overlapping, and not coinciding with the presence-only data of the target species. We divided the presence-only logs into 70% for use as test data and 30% for use as training data. We configured MXS using linear features and with default parameterization, a thousand interactions, 10 thousand background points, and logistic-like output (Elith et al. 2006). We defined the RF parameters on 500 trees, using automatic selection for both the fit and the algorithm (Hastie et al., 2009). We configured SVM by default, using Kernel's linear fit function with probabilistic output. Following de Andrade et al. (2020), we apply ENMTML and run the different SDMs from the R platform, using the ‘dismo’ (Hijmans et al. 2017), 'randomForest' (Liaw and Wiener 2002) and ‘kernlab’ (Karatzoglou et al. 2004) packages.
We selected a randomization process for 30 replicates for each algorithm (self-initialization analysis) to reduce possible correlation errors, avoid biased evaluations, and increase the statistical power of the predictions. We cut continuous predictions using the threshold where Sorensen values are greatest. Sorensen performs better since the index values remain similar regardless of the species prevalence; therefore, it is a more appropriate metric of the model's discrimination ability (Leroy et al. 2018). To identify and define the presence/absence of the target species in the study area, we transformed continuous predictions into binary ones (Barve et al. 2011; Grenouillet et al. 2011). We concatenated the binary predictions, using the weighted average consensus, considering all predictions with Sorensen values above the average value. That is, we used the threshold that maximizes the sum of the sensitivity and specificity to transform continuous models (current or forecasted) into a binary (presence-absence) model (Velazco et al. 2019).