2.1. Pollen Observations
Pollen observations in the UK are compiled from a network of monitoring stations, which has been run by the Met Office since 2011. Up to 20 stations observed pollen between 2011 and 2023 (Fig. 1). However, the number of stations gradually reduced from a peak of 19 in 2011 to just 11 sites by 2023 and was severely affected by COVID-19 lockdown restrictions in 2020, when only 8 stations were active.
All the pollen monitoring stations are run by expert volunteers, following the procedures set out by Caulton et al. (2011). Pollen is collected by Burkard volumetric spore traps designed by Hirst (Hirst, 1952). From late March until early September, the traps are changed at 9am local time; daily in the grass season (mid-May to early August) or weekly outside of this period. Pollen, and other airborne particles, are drawn in through a slot on the Burkhard trap and stick to an adhesive film that is attached to a slowly rotating drum. The speed of rotation of the drum is set to enable the length of film per hour to be determined. Every day or week the sample drum is collected and replaced with one loaded with clean film. In a laboratory, the sample film is removed from the drum and sectioned into 24-hour lengths, which are mounted onto microscope slides. Under a microscope, pollen grains are identified and systematically counted in two-hour traverses. Both bihourly and daily total pollen counts are converted into concentrations in air (grains m− 3), with the daily pollen count defined and recorded as the average pollen concentration in the 24-hour period from 9am local time (which is different to the pollen count defined by Galán et al. (2017)). Bihourly pollen concentrations are likely to exhibit higher variability and uncertainty than daily concentrations due to the lower numbers of grains counted over the bihourly traverses. However, the sub-daily concentrations can still be informative when considering a long time period or multiple sites. The bihourly concentrations are also used to calculate daily mean concentrations from 00Z (as opposed to 9am used in the pollen count) which are used in this paper to allow direct comparison to the NPARU-MO forecast.
Pollen from 12 plant taxa (species, genera or family, where species cannot be easily identified by microscopy) are routinely observed - grass (family), 8 trees (including birch and oak) and 3 weeds (Table S2). For development of model parametrisations, observations from 2011–2021 were used, while 2022 and 2023 were used to examine the model performance.
For the development of emission parametrisations, quality control steps were carried out on the observations. These included only keeping sites that have a) at least 75% data coverage in the peak season (Table S1), b) an Annual Pollen Integral (APIn, the total sum of daily average pollen concentrations for the year) of ≥ 10% mean APIn across all sites for the given year, c) an APIn of ≥ (mean – standard deviation) across all sites for the given year; these last two remove any erroneously low count sites where the quantity of data may not be sufficient to derive accurate trends and therefore parametrisations from.
2.2. Daily Pollen Level / Index
Observed daily mean concentrations for each taxon were combined and converted into the categorical Daily Pollen Level (DPL) used in the UK. Four different levels were defined - Low, Moderate, High (all hay fever sufferers are likely to be affected, (Adams-Groom et al., 2020) and Very High (sufferers are likely to have extensive symptoms). To facilitate quantitative verification, these categories are also referred to numerically as the Daily Pollen Index (DPI), where 1 represents Low, through to 4 which represents a Very High Daily Pollen Level. Each taxon was individually converted to a DPI value (Table 1), with the maximum across all taxa then taken as the overall DPI. Conversion thresholds for additional pollen taxa, which can also be included in the DPI are given in Table S2.
Table 1
Daily Pollen Level (DPI) for each taxon (latin and common names) used in this study, and their associated pollen concentration (c) thresholds in pollen grains per m3 of air.
| Latin | Common name | Low (1) | Moderate (2) | High (3) | Very High (4) |
| Betula | Birch | c < 40 | 40 ≤ c < 80 | 80 ≤ c < 200 | c ≥ 200 |
| Quercus | Oak | c < 30 | 30 ≤ c < 50 | 50 ≤ c < 200 | c ≥ 200 |
| Poaceae | Grass | c < 30 | 30 ≤ c < 50 | 50 ≤ c < 150 | c ≥ 150 |
2.3. NAME atmospheric dispersion model
The Numerical Atmospheric-dispersion Modelling Environment (NAME, Jones et al., 2007) is an atmospheric dispersion model that has been used for many different applications. For example, NAME has been used for modelling the dispersion of volcanic ash (Beckett et al., 2010), nuclear accidents (Draxler et al., 2015), biomass burning (Buus Hansen et al., 2019) and for biological agents such as bluetongue (Burgin et al., 2013) and wheat stem rust (Meyer et al., 2017).
NAME is an offline dispersion model which uses meteorological data from the UK Met Office’s operational numerical prediction model, the Unified Model (UM, Davies et al., 2005). For this work, we have used NAME version 8.4 which has been driven by high resolution 1.5km UM UKV (Tang et al., 2013; Bush et al., 2020) meteorology over the UK and Global UM (Walters et al., 2017) meteorology at approximately 10km resolution outside the UKV domain. NAME can be run either in Lagrangian mode, where ‘particles’ are released into the atmosphere and transported individually by the meteorology, or in Eulerian mode, where a gridded scalar field is evolved, and concentrations are therefore given directly instead of averaging over a grid box of particles as in the Lagrangian scheme. For pollen modelling we use a hybrid version where pollen is emitted as particles into the Lagrangian framework and then transferred to the Eulerian grid after five minutes. The modelling domain used covers the UK, Republic of Ireland and Northern France at a resolution of 0.05° (approximately 5km, illustrated in Fig. 6), with 38 vertical model levels up to 5 km. Skjøth et al. (2009) showed the potential of pollen monitored in London to have been transported from France, which illustrates the importance of including nearby parts of Europe.
Our model resolution of 0.05° is much higher than those used in Sofiev et al. (2015) (ranging from 0.15° − 0.25°), which is achieved through our smaller geographical domain and the need to estimate local variations. Pollen grains from each of the three taxa (birch, oak, grass) are treated as aerosols with diameters given in Table 3, which are allowed to both dry and wet deposit, including by sedimentation. Removal of pollen grains from the atmosphere by dry deposition is parameterised using a resistance-based deposition velocity (Webster & Thomson, 2011) and the treatment of wet deposition is based on the depletion equation using parametrised scavenging coefficients which vary depending on precipitation type (Webster & Thomson, 2014). NAME has been developed to include new parametrisations which allow taxa-level variation in pollen emissions depending on meteorology, which will be described in detail below.
NAME can also output meteorological variables at hourly resolution at specific locations such as at pollen measurement sites. This method has been used to extract meteorology at these sites from the Global UM for January 2011 – July 2013 and the UKV for dates since August 2013 which is used in the development of parametrisations discussed below.
2.4. Source emission maps
Pollen emission maps were developed using species distribution modelling (SDM) techniques. This statistical method allows explanatory data (such as land use and meteorology) to be fitted to known locations of species, given by ‘presence’ data. The fitted model can then also be used for projection onto different geographical domains (as required here for Northern France), or for different explanatory datasets, for example under future climate conditions.
Presence data were provided by the Botanical Society for Britain and Ireland (BSBI) for 11 common species of grass (Table 2), and 2 common native species of both birch (Betula pubescens and Betula pendula) and oak (Quercus robur and Quercus petraea). Additional species of Alnus glutinosa, Corylus avellana, Fraxinus excelsior, Platanus x acerifolia, six species of Salix, Ulmus glabra, Artemisia vulgaris, Urtica dioica and Ambrosia artemisifolia were also provided. Data were provided as observation counts in a 2 km grid box for the period of 1970 to the present day for the United Kingdom, the Republic of Ireland, the Channel Islands and the Isle of Man. To ignore any grid points where there were no relevant observations which would bias the results, the BSBI observations were converted to binary presence or absence data, similar to the method used in Hill et al. (2017): A grid point is set to absent only if another species provided by BSBI was recorded there; however, if no species were recorded, that point is not included in the model.
Three types of explanatory data were used. The first was land use from the CORINE 2018 dataset (CORINE, 2018), which is available at 100m resolution, but re-gridded to 0.05 degrees as a percentage covered by each land use type. The major CORINE land use types were grouped for birch and oak into “broadleaved forest”, “mixed forest”, “coniferous forest” and “urban”, while for grasses these were grouped as “forest”, “urban”, “urban grass”, “rural” and “rural grass” (Table S3Error! Reference source not found.). Soil types were also used as explanatory data, with topsoil properties used from the European Soil Database Derived data (ESDAC, 2013; Hiederer, 2013). Finally, 30 arc-second resolution bioclimatic variables from WorldClim (Fick & Hijmans, 2017) were also included, limited to the least-correlated fields - isothermality, mean temperature of warmest quarter, mean temperature of coldest quarter, annual precipitation and elevation.
The R package biomod2 (Thuiller, 2003) was used to fit an ensemble of SDM models and the probability of occurrence was projected onto the required 0.05° grid. Probability of occurrence was calculated as the mean value predicted by each model in the ensemble, weighted by the true skill statistic of each model, calculated using 5 cross-validation runs using 70% training data and 30% validation data. This process identifies grid points where the species could occur and the probability of this.
The number of pollen grains assumed to be available for emission per mature tree was taken to be 1x1010 grains per year for both birch and oak, based on very limited information in the literature for species of oak not common in the UK - ranging from 1x108 for Quercus rotundifolia, (Tormo Molina et al., 1996) to 5.5x1010 grains for Quercus suber (Gómez-Casero et al., 2004). For each tree species, the emitted grains per km2 was then calculated as the product of the probability of occurrence from the SDM modelling, the number of grains per tree and the estimated number of trees per km2, which is dependent on land use (Table S4). For grasses, the number of grains released per inflorescence (flower head) varies depending on species (Table 2). There are limited data on grass inflorescence density, with most being from more arid habitats than the UK - such as 517 inflorescences m− 2 in Morocco (Aboulaich et al., 2009) and 410 inflorescences m− 2 in Southern Spain (Prieto-Baena et al., 2011). However, a grassland field experiment carried out by Bennie in 2017 (personal communication) gave a mean inflorescence density of 2200 inflorescences m− 2 in an ungrazed meadow in Cornwall: given this is more characteristic of the modelling region this value is therefore used. The emission map in grains km− 2 is derived for each grid point as the product of i) maximum probability of occurrence across all grass species, ii) probability of occurrence weighted average of all the individual species grains per inflorescence values, iii) inflorescence density and iv) estimated percentage of grass in each land use category (Table S4).
Table 2
Grains per inflorescence for the grass species used in the emission estimates. 1Ali et al. (2022), 2Prieto-Baena et al. (2011), 3Severova et al. (2022). 4 Agrostis capillaris is assumed to have the same value as Agrostis stolonifera.
| Latin name | Common name | Pollen grains / inflorescence (x106) |
| Agrostis capillaris | Common bent grass | 2.4004 |
| Agrostis stolonifera | Creeping bent grass | 2.4001 |
| Anthoxanthum odoratum | Sweet verbal grass | 1.3001 |
| Bromus hordeaceus | Soft brome | 2.4502 |
| Dactylis glomerata | Cock’s-foot | 4.6001 |
| Festuca arundinacea | Tall fescue | 11.7002 |
| Holcus lanatus | Yorkshire fog | 5.3001 |
| Lolium perenne | Perennial rye grass | 2.0001 |
| Phleum pratense | Timothy grass | 5.8493 |
| Poa annua | Annual meadow grass | 0.1162 |
| Poa pratensis | Smooth meadow grass | 3.3943 |
Finally, emissions ready for use in the model were converted into units of grams m− 2, using pollen diameter and density information (Table 3). Resulting emissions are shown in Fig. 2, representing the maximum number of grains to be released in a year per unit area.
Table 3
Pollen grain parameters of diameter and density. 1Emmerson et al. (2019). 2Sofiev et al. (2006), 3Panahi et al. (2012), 4Assume same density as for birch.
| | Pollen diameter (µm) | Pollen density (kg m− 3) |
| Birch | 222 | 8002 |
| Oak | 293 | 8004 |
| Grass | 351 | 10001 |
2.5. Seasonal cycle
Seasonal variations in meteorological conditions provide a major influence on the timing and magnitude of events in the seasonal cycle of pollen, for example, the start, peak and end dates for pollen emissions. This is estimated using an accumulated temperature sum, or ‘heat sum’ (HS) above a base threshold temperature (Tbase) from an annual start date (DOYbase). This is similar to the method employed in SILAM (Sofiev et al., 2013) and COSMO-ART (Zink et al., 2013). Eq. (1) shows the heat sum equation at a given time t, with a current temperature Tt.
$$\begin{array}{c}HS= \sum _{t>DOYbase}\text{max}\left({T}_{t}-{T}_{base}, 0\right) \#\end{array}$$
(1)
The parametrisation of the pollen seasonal cycle is derived in two stages: Firstly, the optimum threshold temperature (Tbase) and start date (DOYbase) are estimated by comparing the predicted to observed day of the year for varying thresholds; secondly, the heat sum is fitted to varying percentiles of APIn, with the area under its derivative representing the proportion of the total estimated emissions to be released in each timestep.
The first part involves calculating the threshold temperature (Tbase) and starting day of year (DOYbase) for accumulations. For this, we use the quality-controlled observations for 2011–2021 and calculate the day of the year on which the 5th, 50th and 95th percentiles of APIn are exceeded. For each of these dates we then calculate the heat sum using all possible combinations of Tbase in 1°C increments between 0–10°C and DOYbase in 15-day increments from 0 to the day of the year one week prior to the first observed start date. The median heat sum across all sites and years for each (Tbase, DOYbase) pair can be used to calculate the predicted day of the year on which that heat sum is exceeded. Statistical metrics comparing the predicted to observed day such as Root Mean Square Error (RMSE), Pearson correlation coefficient and the percentage of points correct to within ≤ 4 and ≤ 7 days can therefore be calculated, similar to the method used by Grundström et al. (2019). Finally, a judgement was taken by examining all metrics at the three different API levels as to which (Tbase, DOYbase) is best, as concluded in Table 4.
Table 4
Heat sum and heat sum factor (HFactor) parameters for each pollen group.
| | Birch | Oak | Grass |
| Tbase (°C) | 0 | 4 | 0 |
| DOYbase | 60 (~ 1st March) | 75 (~ 15th March) | 90 (~ 1st April) |
| First linear region (ramp up at beginning of season) |
| HSl1start | 149.120 | 81.441 | 113.025 |
| HSl1end | 263.150 | 175.238 | 680.579 |
| HSl1c | -1.293e-3 | -9.567e-4 | -3.614e-5 |
| HSl1m | 8.671e-6 | 1.175e-5 | 3.197e-7 |
| Derivative curve region |
| HSqd0 | 5.621e-1 | -1.691e-1 | -1.327e-1 |
| HSqd1 | -2*3.351e-3 | 2*1.008e-3 | 2*2.531e-4 |
| HSqd2 | 3*9.757e-6 | -3*2.690e-6 | -3*2.344e-7 |
| HSqd3 | -4*1.379e-8 | 4*3.261e-9 | 4*1.070e-10 |
| HSqd4 | 5*7.583e-12 | -5*1.415e-12 | -5*1.931e-14 |
| Second linear region (ramp down at end of season) |
| HSl2start | 474.920 | 377.264 | 1443.842 |
| HSl2end | 955.475 | 795.745 | 2050.539 |
| HSl2c | 7.324e-4 | 8.233e-4 | 5.549e-4 |
| HSl2m | -7.666e-7 | -1.035e-6 | -2.706e-7 |
Once the taxa-dependent heat sum was calculated, this was used to derive the seasonal cycle to be input to the model through the heat sum factor (HFactor). For each five-percentile increment of the API, the median heat sum was calculated across all sites and years, with a quintic curve then fitted to this data. The derivative was calculated, and the curve start and end points limited to remove any inflection points. A straight line was fitted from these end points to the 0th percentile heat sum (start of season) and to the 100th percentile (end of season). These lines were adjusted such that the area under them match their known APIn percentile at the ends of their line (e.g. 5%, or more if the inflection points were removed). The derivative curve was then adjusted to ensure the total area under the entire season curve was unity. This ensures that over the entire season the area represents the emission for the whole year.
The combined line, showing the seasonal cycle, is illustrated in Fig. 3, with points on this line representing the fraction emitted at a given time t, FEt; this is shown mathematically in Eq. (2). Here qd refers to parameters of the quintic derivative curve, l1 the first linear region at the start of the season and l2 the second linear line at the end of the season. HSl1start and HSl1end are the heat sums at the start and end of the first linear region, while HSl2start and HSl2end represent the heat sum at the start and end of the second linear region at the end of the season: these parameters are all shown in Table 4. Heat sum and heat sum factor (HFactor) parameters for each pollen group.
The area under the curve between the fraction emitted at time t and at the previous timestep t-Δt represents the proportion of the annual total emission that is released during that timestep and is given by a trapezium-rule calculation of the area. The HFactort is therefore given in Eq. (3) where HSt is the heat sum at time t.
$$\begin{array}{c}{FE}_{t}=\left\{\begin{array}{ll}{HS}_{l{1}_{c}}+\left({HS}_{l{1}_{m}}.{HS}_{t}\right),& {HS}_{l{1}_{start}}\le {HS}_{t}<{HS}_{l{1}_{end}}\\ \begin{array}{c}{HS}_{qd0}+\left({HS}_{qd1}.{HS}_{t}\right)+\left({HS}_{qd2}.{HS}_{t}^{2}\right)\\ +\left({HS}_{qd3}.{HS}_{t}^{3}\right)+\left({HS}_{qd4}.{HS}_{t}^{4}\right),\end{array}& {HS}_{l{1}_{end}}\le {HS}_{t}\le {HS}_{l{2}_{start}}\\ {HS}_{l{2}_{c}}+\left({HS}_{l2m}.{HS}_{t}\right),& {HS}_{l{2}_{start}}<{HS}_{t}\le {HS}_{l{2}_{end}}\\ 0,& Otherwise\end{array}\right.\#\end{array}$$
(2)
$${HFactor}_{t}=0.5({HS}_{t}-{HS}_{t-\varDelta t})({FE}_{t}+{FE}_{t-\varDelta t})$$
(3)
2.6. Short term meteorological dependencies
In any pollen season, the modelled pollen emissions will start, peak and end based on their general season cycle as parametrised by the heat sum described above. However, within a season there will be short-term variability, driven by the changing meteorological conditions, which can prohibit emissions. We will therefore now discuss some of the key parameters which are used in NAME to model these effects.
2.6.1. Precipitation
Rainfall can suppress pollen emissions so a precipitation factor PFactor is used in the model. We use the same parameterisation as given by Sofiev et al. (2013) and shown in Eq. (4), where Pt is the precipitation rate in mm hr− 1. This results in emissions decreasing linearly until precipitation reaches 0.5 mm hr− 1 beyond which point there are no emissions. Although other options for a rainfall scheme were tested, the modelled concentrations were found to be not particularly sensitive to the scheme used, but that used by Sofiev et al. (2013) matched the observed concentration variations best during the 2018 and 2019 seasons.
$$\begin{array}{c}{PFactor}_{t}= \left\{ \begin{array}{cc}1,& {P}_{t}=0\\ 1-2{P}_{t},& {P}_{t}<0.5\\ 0,& {P}_{t}\ge 0.5\end{array}\right.\#\end{array}$$
(4)
2.6.2. Wind
Some pollen can be emitted at low wind speeds when there is thermal convection. As wind speed increases, pollen emissions will increase to a certain point whereby they will be limited by availability of pollen grains. We use the wind factor Wfactor parametrisation fromSofiev et al. (2013) as given in Eq. (5), where Ut is the wind speed (m s− 1) at 10 m, w*t is the convective velocity scale (m s− 1) and the constants are fstagnant = 0.5, fpromote = 1.0 and Usatur = 5 m s− 1.
$$\begin{array}{c}{WFactor}_{t}={f}_{stagnant}+ {f}_{promote}+\left( 1-\text{exp}\left(-\frac{\left({U}_{t} +{{w}^{*}}_{t}\right)}{{U}_{satur}}\right)\right) \end{array}$$
(5)
2.6.3. Vapour pressure deficit
Vapour pressure deficit (VPD) is the difference, or deficit, between the amount of moisture in the air and how much moisture the air can hold when it is saturated. It is an important variable for vegetation as transpiration, and therefore plant growth is dependent on it. In the context of pollen emissions, a dry atmosphere promotes pollen release. Although only a few studies consider VPD as a parameter for controlling pollen emissions (Schueler & Schlünzen, 2006; van Hout et al., 2008), by combining temperature and relative humidity to calculate VPD (Pa) should lead to improved emission predictions. VPD is calculated via the saturated vapour pressure (SVP), as shown in Equations (6) and (7), where T is temperature (°C) and RH is relative humidity (%).
$$SVP=610.7*{10}^{\frac{7.5T}{237.3+T}}$$
(6)
$$VPD= \left(1-\frac{RH}{100}\right).SVP$$
(7)
To develop a parametrisation based on VPD, hourly temperature and relative humidity values were extracted at pollen observation sites using NAME and then VPD calculated. Hourly pollen concentrations were matched to hourly VPD. 25 equal size VPD bins were then created, with mean pollen concentrations for each bin calculated. For some bins, the data were classified as less reliable when there is less than 5% of the expected number of data points (given the total number of data points and bins) – these points were therefore given a lower weighting, when a sine curve was then fitted through the mean binned data.
The VPDfactor was therefore modelled as a sine curve with maximum at one, shown by Eq. (8), where VPDmax parameter is given in Table 5 and varies by taxa.
$${VPDFactor}_{t}= \left\{\begin{array}{cc}\text{sin}\left(\frac{\pi }{VPDmax}.\left({VPD}_{t}+\pi \right)\right),& VPD<VPDmax\\ 0,& VPD\ge VPDmax\end{array}\right.$$
(8)
Table 5
Taxa-specific parameters for Vapour Pressure Deficit (VPD) and diurnal cycle parameterisations.
| | VPDmax (Pa) | Tshift (hours) |
| Birch | 2725 | 9 |
| Oak | 2619 | 9 |
| Grass | 4135 | 12 |
2.6.4. Diurnal cycle
Pollen concentrations vary significantly during the day. Application of the meteorological dependant parameterisations described above in NAME results in some diurnal variation, however these were found to poorly match the variation shown in the observations. Consequently, a simple diurnal profile was developed using a cosine curve as shown in Eq. (9), where Hrt is the current hour in the day and tshift is given in Table 5. The value of tshift was found by running NAME for 2018 with varying values of tshift to find the best match in the peak season. NAME was used here to account for the difference between when plants emit pollen and the time taken to advect downwind to the monitoring sites. The importance of including this factor is shown in Fig. 4, which illustrates moving the peak from around 8pm to mid-afternoon for the trees and late afternoon for grass as indicated by the bihourly observations.
$$Dfactor=1+\text{cos}\left(\left({Hr}_{t}+24-tshift\right)* \frac{\pi }{12}\right)$$
(9)
2.7. Implementation in NAME
Within the NAME model, the heat sum is incremented every time step (5 minutes) and can be output and read in again should the model need to be stopped and restarted (which is done daily). A new pollen emission scheme has been added to the code which then applies the seasonal and daily parameterisations described above. This is shown by Eq. (10) for S which is the source emission (g gridbox− 1) for the current timestep, where the factors are described above, refSS is the reference source strength as read from the emissions (Section 2.4) and converted within NAME to g gridbox− 1 s− 1; sPerYear converts emission strength to g gridbox− 1 yr− 1 and deltaTime accounts for the time step duration of the current release. Once emitted within NAME it is advected and undergoes deposition before hourly concentrations are calculated.
$$S=refSS*HFactor*PFactor*WFactor*VPDFactor*DFactor* \frac{sPerYear}{deltaTime}$$
(10)
2.8. Model evaluation set-up
The NAME pollen model has been run for two years, 2022 and 2023, which were both different in terms of meteorology and the resulting pollen levels. Spring 2022 was warmer than average in March and May and drier than average in April (UK Met Office, 2023). In contrast Spring 2023 had generally more average rainfall and temperatures, although March was very wet, especially in the south where there was double the average rainfall and May was warmer than average. June was warm and dry in both years, but particularly hot in 2023, which was the warmest June recorded in the UK. July 2022 was very hot and dry, with a new UK record of 40.3°C recorded in Lincolnshire on the 19th, but in contrast, July 2023 was cooler than average and the wettest since July 2009.
The CAMS ensemble (METEO FRANCE et al., 2022; Sofiev et al., 2015) produces a daily pollen forecast for Europe, comprised of 11 process-based air quality forecasting models, adapted to include pollen, which are combined to produce an ensemble median. Their modelling domain is at 0.1° resolution over Europe (compared to 0.05° for NAME). Six pollen taxa are incorporated in their models which include both birch and grass as well as alder, mugwort, olive and ragweed although oak pollen is not currently modelled. This therefore provides a good comparator to the Met Office models and is included in verification metrics where appropriate.
There is a significant interannual variation in observed pollen levels, particularly in trees such as birch and oak. This is dependent on factors outside of the current season, for example weather conditions in the preceding summer (Adams-Groom et al., 2022) and is not currently captured by the NAME model. However, for examining the behaviour of the model for historical seasons, this can be overcome by scaling the model forecasts by a taxa and year-dependent factor. Here we have defined a scaling factor as the ratio of the observed to NAME-predicted annual pollen integral (APIn). This varies by site, but we take the median scaling factor across all sites to apply a consistent value suitable for use with gridded data. This factor is only applied to the NAME forecast on the assumption that the NPARU-MO and CAMS forecasts are able to predict this variation given they are used operationally. This may give NAME an advantage for some statistics, but the correlation coefficient will remain a fair metric. The use of the scaling factor also allows for correction of uncertainties in the pollen production values used in the emission maps. In theory a mean scaling factor, calculated from multiple years, could be applied to the emissions prior to running NAME, but this should give the same results as scaling the final concentrations in post-processing. Operationally, a mean scaling factor value could be used, but this would not account for the interannual variability which is still an active area of research.
The skill of the NAME model has been verified against daily mean observed pollen concentrations from the UK Pollen Network, which are available routinely from typically the 14th March until around the 5th September, with some variation between sites. Although NAME produces hourly concentrations, we have focussed the verification in this paper on daily mean values as this allows easier comparison to the NPARU-Met Office forecast and is the clearest way to illustrate the skill of the model at predicting the key seasonal variations. The observed DPI included in the verification includes all nine taxa (birch, oak, grass, hazel, alder, ash, plane, nettle and mugwort) which are both regularly monitored and can be converted to DPI, using the thresholds in Table S2. NAME has also been compared to the existing NPARU-Met Office Daily Pollen Index forecast (‘NPARU-MO’) and the CAMS ensemble median (‘CAMS’) individual birch and grass concentrations and the calculated Daily Pollen Index (noting that CAMS does not include oak which is an important component of the Daily Pollen Index). We also provide statistical metrics for a DPI calculated using only the three taxa of birch, oak, and grass, which gives a greater insight into the usability of the NAME model.