Machine Learning approaches in the evaluation of pXRF data in provenance studies of transport amphorae

doi:10.21203/rs.3.rs-8310173/v1

Download PDF

Research Article

Machine Learning approaches in the evaluation of pXRF data in provenance studies of transport amphorae

https://doi.org/10.21203/rs.3.rs-8310173/v1

This work is licensed under a CC BY 4.0 License

You are reading this latest preprint version

Ceramics, which have been manufactured at a specific location from local clayey raw materials following a particular workflow of clay paste processing, are assumed to exhibit a characteristic composition in terms of elemental concentrations, mineralogical compounds and petrographic fabric. The examination of ceramics from different manufacturing sites or regions for example through elemental analysis allows for defining compositional categories as reference for the origin of manufacture. The examination of ceramics from trading or consumption sites, on the other hand, allows for assigning them to these reference categories and, thus, for investigating their origin and dissemination. In the particular case of transport amphorae, ancient trade networks for commodities, such as wine, oil or grain, can be investigated. For the elemental analysis of archaeological ceramics commonly laboratory methods are applied, such as neutron activation analysis (NAA) or wavelength-dispersive X-ray fluorescence spectrometry (WD-XRF), providing high analytical performance requiring, though, the sampling of an albeit minute material amounts from a ceramic artifact. Handheld portable energy dispersive XRF (pXRF), on the other hand, allows for non-invasive analysis of large numbers of ceramic artifacts within comparably short time periods. A major drawback of pXRF, though, concerns the higher analytical uncertainties in terms of precision or reproducibility as well as in terms of accuracy impeding eventually the statistical data evaluation following approaches commonly applied to multivariate quantitative data collected with laboratory analyses and the comparison with external reference data. However, even though pXRF data might be more blurry or fuzzy they still represent compositional similarities or dissimilarities, which might be revealed with alternative approaches for categorization. An initial case study testing unsupervised machine learning with self-organizing maps (SOM) on a dataset of Hellenistic transport amphorae from the Paphos Agora, a market place in Cyprus, indicated the potential of automated categorization of pXRF data through machine learning. In the present case study supervised machine learning models, such support vector machines (SVM), random forest (RF) as well as supervised artificial neural networks (ANN), have been tested on pXRF data of transport amphorae from East Aegean islands and from Paphos. For this, NAA data of a part of the analysed amphora fragments have been used for predefining compositional categories in the training data. The present data repository at our laboratory comprises c. 2200 measurements of c. 1400 individual amphora fragments from production centres as well as exchange centres in the Eastern Mediterranean region. Even though this is a comparably large number of data records the generation of synthetic training data was tested. The ultimate scope of the present case study will be to train a machine learning model for automated pattern recognition and prediction of the origin of manufacture of transport amphorae in order to study trading networks in the region.

Eastern Mediterranean

transport amphorae

pXRF

trade networks

machine learning

The general scope of provenance studies of archaeological ceramics is to define categories based on specific attributes, which define ceramic wares with a particular origin of production. The investigated ceramic objects can be considered as cases representing individual categories. Once these categories have been defined, cases of unknown origin can be categorized accordingly. In this way, the organization of manufacture can be investigated as well as the dissemination of ceramic wares within past trade networks. One common approach for defining ceramic categories is the analysis of their elemental composition. The underlying hypothesis is that ceramics, which have been produced at a specific place of production, present a distinct elemental composition, which can be distinguished from the elemental compositions of ceramics produced elsewhere. This assumed compositional distinctiveness is primarily related to the natural geochemical diversity of the used clayey raw materials, the extraction of which was commonly constricted to the proximate vicinity of the manufacturing site. The raw materials, however, were further processed in order to obtain a workable clay paste for fashioning and shaping vessels or other objects, which furthermore exhibited feasible material properties during firing of the ceramics as well as in view of the use and function of the ceramics (Tite 2008). Typical steps during the clay paste processing are refinement of the raw materials through homogenization and removal of large non-plastic inclusions, the addition of temper materials for improving thermo-mechanical properties and the potential mixing of different clays. For this, the elemental composition of ceramics commonly cannot be directly linked to the elemental composition of the raw materials used for their manufacture (Hein and Kilikoglou 2020). On the other hand, the individual workflow followed by craftspeople in different workshops might even allow for distinguishing ceramics manufactured from the same initial clay sources. For the definition of a specific ceramic category, though, its compositional variation has to be taken into consideration, which includes the natural variation of the clays sources as well as variation during the workflow of clay processing (Hein and Kilikoglou 2017). The true composition of the ceramics, furthermore, is potentially affected by post-depositional alteration (Buxeda I Garrigos 1999). Nevertheless, provenance studies based on elemental analysis have proven to provide fundamental results in view of identifying reference patterns of individual manufacturing places and investigating the dissemination of ceramics (Mommsen et al. 2002, Glascock and Neff 2003).

The common methodological approach for examining a pottery assemblage for elemental composition is to take samples of the ceramic bodies and to determine their bulk composition in the laboratory using well-established methods for silicate analysis, such as neutron activation analysis (NAA) or wavelength-dispersive X-ray fluorescence analysis (WD-XRF). These laboratory methods provide high analytical precision and based on their calibration with standard reference materials (SRM) also high analytical accuracy (Hein et al. 2002). An alternative approach, which has been introduced in the field during the recent two decades, is the non-invasive elemental analysis using handheld portable energy dispersive XRF (pXRF) systems (Morgenstein and Redmount 2005, Goren et al. 2011, Hunt and Speakman 2015). The advantages of pXRF in comparison with laboratory methods are apart from the provided integrity of the analysed object, as no sampling is necessary, the possibility to take the measurements on site and to achieve prompt analytical results. This allows for measuring considerably larger numbers of objects within short time periods. On the other hand, a series of issues have to be considered, which affect the precision and accuracy of the analytical results. Due to the absorption of exciting radiation and fluorescence radiation the method is extremely surface sensitive so that for the determination of the bulk composition of the ceramic body spots without obvious decoration, coating or deterioration should be selected. Furthermore, the systems are commonly calibrated for measurements of flat samples in negligible distance to the pXRF window, which might distort measurements of objects with irregular surface geometries. Due to the limited electrical power of the usually battery driven X-ray source and the commonly shorter measurement times the total counts in the recorded XRF spectra are comparably small increasing statistical uncertainties. This affects the precision of the estimated peak areas and increases the lower limit of detection of the individual elements. For this, the number of element concentrations or attributes for categorization, which potentially can be determined by pXRF is usually smaller in comparison with WD-XRF or ED-XRF systems applied in a laboratory. The above-discussed analytical uncertainties contribute considerable variability to the determined elemental compositions of the analysed ceramic objects and might obscure significant differences of distinct compositional patterns, which can be detected using more precise analyses using laboratory methods. The conventional approaches for the evaluation of compositional data are based on multivariate statistics, which have been developed for laboratory data (Baxter et al. 2008). These statistical methods, though, produce to some extent ambiguous results evaluating the comparably fuzzier compositional data obtained with pXRF (Holmquist 2016). It has to be considered that the conventional pattern recognition takes into account assumptions regarding the geochemical variation of the initial raw materials. These become less relevant regarding pXRF data. A more flexible approach taking into account the actual data structure is expected to be more feasible. During the recent years the application of artificial intelligence and machine learning algorithms has become increasingly important in chemistry (Käser et al. 2023) as well as in Cultural Heritage studies (Fiorucci et al. 2020). For this, a series of machine learning approaches has been tested for categorizing pXRF data. The examined data were collected by analyzing Archaic, Hellenistic and Roman transport amphorae from the Eastern Aegean region. Transport amphorae were the standard containers of the ancient world, so that the investigation of their dissemination provides valuable information concerning trade networks and economic relations. Their compositional analysis with pXRF allows for collecting an amount of data, which hardly can be achieved with laboratory analyses. For the automated pattern recognition and categorization of these data machine learning approaches appear to be a promising solution.

In the present study data from a data repository are tested, which comprises c. 2200 pXRF datasets obtained from the measurement of c. 1400 individual amphora fragments. The assemblages come from various amphora production centres, such as on the Dodecannese islands of Kos (Halasarna: 36.777359 N, 27.138311 E; Kos Town: 36.894241 N, 27.287766 E), Rhodes (Rhodes Town: 36.440479 N, 28.225664 E) and Samos (Heraion: 37.671944 N, 26.885555 E), as well as from Hellenistic and Roman market and consumption places, such as Paphos in Cyprus (Agora: 34.760159 N, 32.408039 E). The amphora data were recorded using a NITON XL3t GOLDD+ hand-held system (Thermo Fisher Scientific) using the preset ‘soil’ method (Hein 2021). With the ‘soil’ method three different energy ranges (‘main’, ’low’, ’high’) are measured with a pre-selected lifetime of 120 s. The measurements in air provide the concentrations of potentially up to 33 elements in the range of S to U. In the case of the present pXRF data, though, only 16 of these element concentrations have been considered as attributes for the compositional categorization: K, Ca, Ti, V, Cr, Mn, Fe, Ni, Cu, Zn, Rb, Sr, Zr, Pb, Th and U. The other elements presented concentration values to a large extent below the limit of detection or they were excluded either due to their known geochemical mobility, such as S and As, or apparent discrepancies in determination of the true value, such as Sc or Co. Particularly minor and trace elements have proven their potential in discriminating ceramics manufactured at different sites (Hein and Kilikoglou 2017). The measurements of the amphora fragments were taken preferably in broken sections, which if necessary were cleaned. As no collimator has been used, the measurement area has an estimated diameter of c. 6 mm. Apart from the three spectra, during each measurement a photograph of the analysed area is recorded with the integrated camera, so that conspicuous discrepancies of specific element concentrations can be scrutinized taking into account inclusions or contamination of the analyzed ceramic body.

For a part of the analysed amphora fragments complementary compositional data exist, which have been collected using neutron activation analysis (NAA). For this, sub-samples of c. 200-300 mg have been cut from the selected fragments and powdered after removing surface layers. The ceramic powders have been irradiated in a research reactor together with standard reference materials (SRM) and γ-spectra of the irradiated samples were recorded one week after irradiation and c. three weeks after irradiation. The spectra of the ceramic samples were evaluated and calibrated with the spectra of the SRMs revealing concentrations of 27 to 34 primarily minor and trace elements. Due to the high precision of the NAA data and the accuracy provided through the calibration with SRMs NAA datasets, which have been measured at the N.C.S.R. “Demokritos” research reactor (Hein et al. 2008), straightforwardly can be compared with NAA data measured at the University of Missouri Research Reactor (MURR) (Hein et al. 2021). On the basis of the NAA data amphora fragments have been categorized and compared with potential reference patterns accessible on the open access database of our laboratory (Hein and Kilikoglou 2012). This database comprises elemental concentrations of archaeological ceramics determined by NAA and information about compositional categorization. In the present case study, the respective categories have been used as predefined classes for training and testing pXRF datasets collected through analysis of the same fragments in supervised machine learning models. Thus, a training dataset has been compiled comprising 637 pXRF analyses of 188 amphora fragments, which are assumed to represent 33 different compositional categories (Table 1) (supplementary data). The machine learning models are furthermore tested on two pXRF datasets Koan_AMPH and Paphos_AMPH. The Koan_AMPH dataset comprises primarily amphora fragments manufactured in Kos Island (supplementary data) (Table 2). An essential part of it concerns Hellenistic amphorae excavated in South Central Kos (Hein et al. 2008), which are also the basis of six of the seven Koan categories included in the training dataset. Apart from these, Hellenistic amphorae from other parts of the island as well as Archaic and Roman amphorae are included. Furthermore, the dataset comprises Koan type amphorae excavated in Attica and in the Levant as well as eight samples of fired clays collected in Kos. The Paphos_AMPH dataset, on the other hand, comprises 368 pXRF measurements of 291 amphora fragments from the Paphos Agora comprising local amphora types as well as imports from the Aegean and the Western Mediterranean (Hein et al. 2021).

3.1 Multivariate statistics and formation of compositional categories

The outcome of the elemental analysis of a ceramic assemblage is typically a p×n data matrix with p the number of cases analysed and n the number of attributes or in this case concentrations of individual elements. In order to explore and define compositional categories the multivariate dataset commonly is transformed to a dataset with reduced dimension applying for example hierarchical cluster analysis or principal component analysis (PCA) (Baxter et al. 2008). For assessing similarities or dissimilarities among individual cases, though, the data first have to be scaled or normalized particularly because concentrations of different elements might cover ranges from a few µg/g up to percentages of the content. Common approaches for normalization of compositional data are the additive logratio transformation (alr) or the center logratio transformation (clr), which has been used in the present case study (Buxeda I Garrigos 1999, Baxter and Freestone 2006). Categories are defined based on reference patterns x, which are basically vectors (x₁, x₂, …, x_n) comprising the expected values of the attributes measured for cases of the same category (Hein and Kilikoglou 2017). For assessment of compositional similarities model-based approaches in contrast to exploratory data analysis postulate a data model for the attributes of cases belonging to the same category (Papageorgiou 2020). The data model includes apart from the actual pattern x an estimation of its variance. This can be in the simplest case the normal or log-normal distribution of each attribute or the multivariate normal distribution (MVN) taking into account correlations among attributes. The MVN of a reference pattern can be estimated by determining the covariance matrix S_x based on the measured sample data (Baxter 2001). Taking into account S_x the similarity or dissimilarity of patterns of different categories or of an individual case with selected categories can be assessed with the Mahalanobis distance (Baxter 2001). Concerning the simpler model of normal distributions disregarding correlations a distance based only on the diagonal terms of the covariance matrix, which in fact are the squared standard deviations, can be determined (Hein and Kilikoglou 2017). Both distances can be normalized with n, the number of attributes, and provide in this way the probability for similarity.

In the case of multivariate compositional datasets collected using laboratory methods, such as NAA or WDXRF, the data can be usually straightforwardly processed using exploratory and/or model-based methods. These methods provide considerably precise and accurate measurements of the true values of attributes, based on which the MVN of individual categories can be estimated. In this way, stable data repositories with reference patterns of diverse production places can be assembled for investigating the dissemination of pottery wares (Hein and Kilikoglou 2012). Datasets collected with pXRF, on the other hand, implicate additional analytical uncertainties potentially obscuring similarities and dissimilarities among compositional categories. For this, treating pXRF data with conventional statistical approaches commonly provide less detailed compositional categorization compared to the data evaluation based on NAA or WDXRF (Hein et al. 2021).

3.2 Self Organizing Maps (SOM)

Self Organizing Maps (SOM) are unsupervised artificial neural networks (ANN), which are trained to project multivariate data on a two-dimensional array of discrete codebook vectors or output neurons (Kohonen 1995). The similarity or dissimilarity of the original data is transferred to topological proximity of the output neurons in the modeled SOM. Thus, the placement of cases in the same codebook vector field of the SOM or in an adjacent field indicates compositional similarity. This allows for distinguishing cases allocated in different regions of the SOM according to their elemental composition (Hazenfratz et al. 2017). In an initial case study of unsupervised machine learning applied to pXRF data the kohonen package (version 3.0.10) has been used for testing the generation of SOM models of the Paphos_AMPH data set (Hein and Kilikoglou 2024). This package is available in R and it provides apart from the training of the SOM various visualization methods (Wehrens and Kruisselbrink 2018).

3.3 Support Vector Machines (SVM)

Support Vector Machines (SVMs) are a class of supervised machine learning algorithms, which can be used for classification and for analytical regression of multivariate data (Allegretta et al. 2020). During the training hyperplanes are determined in the multi-dimensional space of the input data, which separate the multivariate data according to the predefined classes and maximize the margin between these classes. The SVM models can be generated by using different types of kernels, such as linear, polynomial or gaussian (radial basis function - rbf), and defining the regulation parameter C (Ruschioni et al. 2023). In the present case study the scikit learn Python library (Version 1.5) was used for the implementation (https://scikit-learn.org/stable/).

3.4 Random Forest (RF)

Random Forests (RF) are supervised machine learning algorithms, which combine and assess ensembles of decision tree classifiers for improving their prediction accuracy. The method is based on ensemble learning by bootstrap aggregating or bagging of regression trees in order to predict classifications of multivariate data as well as mixed-mode data (Brokamp et al. 2017). The individual decision trees are trained with random subsets of the training data taking into consideration random subsets of the attributes. For the implementation of the RF models as well the scikit learn Python library (Version 1.5) was used(https://scikit-learn.org/stable/).

3.5 Artificial Neural Network

Artificial Neural Networks (ANN) comprise two or more layers of artificial neurons or perceptrons, which are connected among each other and can be activated through predefined activation functions depending on the transferred signal and the weight of the connection. In this way, data in the input layer are linked with data in the output layer. During the training of the ANN the weights of the connections between individual preceptrons are stepwise adapted in order to optimize the prediction of output attributes according to input attributes. The above discussed SOM are a rather simple ANN comprising only input and output layers. In a supervised ANN model, on the other hand, which commonly comprises also hidden perceptron layers, the classification of the training data is pre-defined and the weights are stepwise optimized accordingly. The use of ANN models has multifold applications in the classification and categorization of multivariate and/or mixed–mode data in general and in the classification of XRF data in particular. Two different approaches are followed, though, the evaluation of elemental compositions (Barone et al. 2019, Ruschioni et al. 2023, Kodikara et al. 2024) and the evaluation of the initial spectral data (Shugar et al. 2021, Rieger et al. 2023, Andric et al. 2024). In the present case study the above discussed compositional pXRF data were used for training and testing the categorization with supervised sequential ANN models. For this, the Keras Python API for TensorFlow (Version 2.18.0) was applied for the implementation (Chollet 2017).

4.1. Unsupervised Machine Learning

In an initial case study the dataset Paphos_AMPH had been tested in an SOM model comprising 366 pXRF measurements of 288 amphora fragments from Paphos (Hein and Kilikoglou 2024). In order to conform to the evaluation of the same dataset with conventional statistics, which had been published before (Hein et al. 2021), the number of attributes was reduced to fifteen disregarding U. For normalization of the dataset additive logratio transformation (alr) was applied on the element concentrations with the Fe concentrations as common divisor. A toroidal 10´10 base map with hexagonal topology was defined. The codebook vectors were initialized with random values and then stepwise trained considering the squared Euclidian distance as distance criterion. Based on the hierarchical clustering of the resulting codebook vectors 20 clusters were defined corresponding to the number of categories expected based on the evaluation of NAA data of a part of the samples. The SOM clusters not only confirmed the clustering of the same dataset using conventional exploratory statistics (Hein et al. 2021) but allowed for subdividing some of the original clusters providing thus a higher detailedness in defining categories at a similar level as the NAA data. A drawback of this initial case study of unsupervised machine learning, though, was that due to the random initialization of the codebook vectors the resulting SOM was not necessarily reproducible and particular cases appeared to be ambivalent in terms of categorization.

4.2. Supervised Machine Learning

The supervised machine learning methods were tested first with the dataset Koan_AMPH. The compositional categories in this assemblage have been investigated in previous studies, in which 75 of the Hellenistic amphora fragments have been analysed with NAA allowing for assigning compositional patterns to specific origins of production (Hein et al. 2008). Seven categories were defined in a training dataset with 201 cases of pXRF measurements of 58 fragments from Halasarna and Kos Town. These categories represent production places in South Central Kos, in the Northwest of the island and potentially also in the opposite coast of Asia Minor. For normalization the data were center logratio transformed.

In a first approach supervised machine learning was tested with SVM and RF models based on the seven Koan amphora categories (Table 1). For a series of SVM models different kernels were tested - linear, polynomial and gaussian – optimizing the regulation parameter C. For the RF models, on the other hand, the different criteria ‘gini’, ‘entropy’ and ‘log_loss’ were tested with predefined maximum depths of the trees between four and six. The present RF model is based on the ‘log_loss’ criterion and a maximum depth of six and it was tested on normalized and non-normalized compositional data. The default number of trees in the forest was 100 and the maximum number of features to be considered in an individual tree was eight. The dataset of 201 cases was divided in 140 cases for training (70%) and 61 for testing (30%). Both algorithms, SVM with polynomial or gaussian kernels as well as RF with normalized data, provided a factual accuracy, weighted precision and weighted recall of 100.0 % taking into account true/false positive classifications as well as true/false negative classifications (Table 3). The classification of the cases in the Koan_AMPH dataset indicated a general agreement of the SVM and RF models (Figure 1 a and b). The main divergence concerned the classification of cases to categories Kos-A and Kos-D (Figure 1 c). The SVM model apparently classified cases not included in the training data rather to Kos-A while the RF model classified them to Kos-D. Apart from this, though, it has to be considered that cases can only be classified according to the predefined categories in the training data and thus a considerable number of false positive classifications has to be taken into consideration, eventually representing imports from other islands or the coast of Asia Minor. For this, the training dataset was extended with seven categories of Rhodian amphorae, which had been found in Rhodes Town. Some of these Rhodian amphorae, though, had been supposedly manufactured in the Rhodian Peraia in the Western part of the Knidian Peninsula. Furthermore, a series of nine other supposedly Aegean amphora categories was included in the extended training dataset, which apart from the seven Samian amphorae, which were analysed actually in Samos, were found in the Paphos Agora. The dataset of 460 cases with known NAA categories of 145 amphora fragments was divided in 322 cases for training and 138 cases for testing. The accuracy of both SVM and RF models was still at 93.5% to 98.6 % indicating feasible classification and clear distinction among pottery productions on the Eastern Aegean islands and the coast of Asia Minor (Figure 1 d and e) (Table 3). The lower accuracy of the RF model of the normalized data was mainly related to the Sam-A category, which could not be predicted in the model. Expectedly, most of the cases in the Koan_AMPH dataset were classified according to the initial seven Koan categories. Only a small number of cases was classified according to Rhodian amphora categories, which, however, assumedly represent origins of manufacture in Asia Minor or the Knidian Peninsula, such as Rho-B, Rho-E and Kni-A as well as a number of cases classified as Nik-A, which is a hitherto non-localized category (Figure 1 f). The comparison of SVM and RF models, furthermore, indicates ambiguous classifications of the Koan categories with Rho-E and Kni-A, which presumably was the result of the small number of cases representing the latter categories in the training dataset. Eventually, ten further categories were included in the training data set representing origins of manufacture in the Western Mediterranean and Cyprus with 637 cases representing 188 individual fragments, so that the data sets for training and testing were extended to 445 and 192, respectively. The test data still indicated a sufficient accuracy of 92.7% to 93.8 % of both the SVM and the RF models (Figure 1 g and h) (Table 3). The models were tested for their prediction potential by predicting categories in the Paphos_AMPH dataset (Figure 1 i). The classifications of the two models indicated agreement in categorization of some of the larger groups, such as Kos-A, Kos-D, Kni-A, Chi-A, Eph-A, Nik-A, Tha-A and Sic-A, while the categorization of the Cypriot amphora groups (Pap-# and Kou-#) appears to be rather ambiguous.

The small number of cases representing some of the categories as well as the implicit imbalance of case numbers in the training dataset appeared to be an issue affecting the accuracy of the classification of new data. For this, the generation of synthetic data based on the existing training data and their estimated compositional variation was tested. Figure 2 presents the Ti and V concentrations of training dataset cases representing three of the Koan amphora categories. Based on these concentrations synthetic datasets were generated including each 1000 cases applying three different approaches. The data in Figure 2b are based on random mixtures of cases representing the respective categories (Barone et al. 2019) while the data in Figure 2c are random concentrations based on the normal distributions estimated for the individual element concentrations presenting a more realistic variation. The apparent correlation of the element concentrations, though, appears to be effectively considered only if the synthetic data are randomly generated based on the covariance matrix (Figure 2d). For this, a synthetic training dataset was generated based on the covariance matrices of the original categories. As for the determination of a covariance matrix the number of cases should be at least at the number of attributes, in the present case 16, for the categories with insufficient numbers of cases artificial new cases were created by introducing random fuzziness based on the individual standard deviations. Furthermore, in the case of negative modeled concentration values these were replaced with random values between 0 and the minimum values of the original data in order to allow for clr normalization.

The synthetic training dataset was divided in 23100 cases for training and 9900 cases for testing. While the accuracy of the SVM models was increased up to 99.9 % the RF models presented some inaccuracies concerning synthetic data generated based on categories with a small initial number of cases. The comparison of the classification based on the SVM model with the classification based on the RF model did not improve in terms of agreement most probably due to the problems with the RF model.

Eventually, the increased number of cases in the generated synthetic training dataset allowed for testing machine learning using a sequential ANN model. The ANN model was tested with normalized as well as non-normalized data with additional normalization of the input layer. The data were forwarded to two hidden layers, both with Rectified Linear Unit (‘relu’) activation. For the first hidden layer unit numbers of 64, 128 and 256 were tested, while the second hidden layer comprised 32 units. From here the data were forwarded to the output layer, in which the units corresponding to the predefined categories were activated with the ‘softmax’ function providing an estimation of the most probable category. First, the ANN was trained with the synthetic training data set of the 23 categories of Eastern Aegean transport amphorae tested already above with the SVM and RF models. For compiling the present ANN model ‘sparse_categorical_crossentropy’ was selected as loss function and the ‘Adam’ optimizer with a learning rate of 0.00001. The model was fitted in 1000 steps using like in the previous ML models 70% of the dataset for training and 30% for testing. The final accuracy of the training was indicated with 99.7% to 99.9% (Table 4). Figure 3 presents a heatmap of the Koan_AMPH dataset classifications according to the ANN model with the normalized input layer and 256 units in the first hidden layer with the above reported classification according to a SVM model with a gaussian kernel (‘rbf’) (Figure 3, left). There is a general agreement concerning the Koan amphora categories apart from a few cases of the main category Kos-A classified by the SVM model and identified as Kos-D/Kos-D2 or even as imports from Asia Minor and the Knidian Peninsula (Rho-E and Knidos) by the ANN model or cases classified as Category Kos-D by the SVM model and Kos-A or Rho-E by the ANN model. Apart from this, it can be noted that there appears to be an agreement between the two ML models concerning individual cases representing origin of manufacture from Chios and Ephesos as well as the Nikandros ware. For assessing the potential of the ML models predicting categories the predicted categories of stamped amphora handles appear to be of particular interest as they represent specific amphora workshops. Table 5 lists the predictions of categories according to the different ML models of four stamped amphora handles excavated in South-Central Kos, and two Hellenistic as well as five Roman Koan amphora handles from the collection of the National Museum in Athens. While the four stamped amphora handles from South-Central Kos can be clearly assigned to the main category from this area Kos-A the predictions for the two Hellenistic amphora handles from Attica appears to be ambiguous. Three of the four measurements of Sample ATH-H-17-01 indicate the Category Kos-C. The fourth measurement indicate Kos-C only based on some of the ANN models but Tha-A based on the SVM models. In the case of Sample ATH-H-17-02, on the other hand, there appears to be a confusion between Categories Kos-C and Kos-D2, which eventually might indicate another Koan category, which has not been included in the dataset. Concerning the Roman amphora handles, two of them, ATH-R-25-02 and ATH-R-25-04, can be categorized as Kos-A and Kos-D, respectively, based on the ML models. ATH-R-25-05 indicates more or less clearly Category Rho-B, which probably can be localized in Knidos. The remaining two handles present rather unclear categorization, which might be related to missing training categories as well.

Following the evaluation of the Koan_AMPH dataset a larger ANN model was compiled with the full synthetic training dataset representing 33 predefined categories in order to evaluate the Paphos_AMPH dataset (Table 1). The ANN model was tested again with a normalized input layer and a first hidden layer comprising either 64, 128 or 256 units, respectively. The accuracy of the ANN models was between 99.4% and 99.8 %. Figure 3 (right) presents the categorization of the Paphos_AMPH dataset comparing again the SVM model with a gaussian kernel and the ANN model with 256 units in the first hidden layer. The two models provide a largely concordant categorization of the dataset with only a few categories missing, such as Kos-E, Rho-C and Sam-A in the ANN model and Rho-C missing in the SVM model. There appears to be again a slight uncertainty concerning the Category Kos-D of the SVM model, out of which a part of the cases is assigned to different Aegean categories based on the ANN model. The Cypriot amphora categories are clearly separated apart from the cases in Category Pap-C in the ANN model, which appear in different mainly Rhodian categories in the SVM model. This might be related to a similar geological context. Table 6 lists the predictions of categories according to the different ML models of eleven stamped amphora handles excavated in South-Western Cyprus and of four fragments of basked handled amphorae, a specific amphora type assigned to the Eastern Mediterranean. The two stamped Kourion handles can be clearly categorized as Kou-C and Kou-D, respectively. On the other hand, only one of the two stamped Knidian handles is categorized as Kni-A, while the other indicates rather Chian origin. The stamped Ephesian amphora handle indicates an unclear categorization, which might be again due to a missing category from the coast of Asia Minor in the training data. A potential Ephesian origin, however, is indicated for one of the Thassian stamped amphora handles, while the other two are clearly categorized as Tha-A. An ambiguous categorization is eventually indicated for the three Sinopian stamped handles. However, in this case it has to be considered that amphora categories from the Black Sea are hitherto missing in the training data. Missing training data, such as from Cilicia or the Syro-Palestine region, might be considered as well in view of the rather unclear categorization of the basket handled amphorae. In one case even a Western Mediterranean origin is indicated, which, though, most probably can be considered as false positive categorization.

Machine Learning (ML) allows for automated and objective categorization of archaeological materials and objects based on different attributes, such as in the present case quantitative data from elemental analysis. Compositional data collected with laboratory analysis providing high precision and accuracy can be effectively evaluated with multivariate statistics assessing similarities and dissimilarities among patterns. The potential of these approaches in categorization of compositional data collected with pXRF, however, is limited due to the analytical uncertainties obscuring compositional distinctions. On the other hand, the application of non-invasive methods, such as pXRF, provides fundamental advantages in terms of data collection. Unsupervised ML models, such as SOM, have demonstrated that ML might provide more detailed categorization of pXRF data compared with conventional multivariate statistics. The strengths of both analytical approaches, highly precise and accurate laboratory analysis and non-invasive analysis of a large number of cases, can be combined in supervised ML models. For the automated classification of pXRF data ML models can be trained with training datasets defining categories based on laboratory analyses. ML algorithms, such as support vector machines (SVM) or random forest (RF) allow for an effective distinction of cases in the training data. However, they appear to be limited in terms of ambiguous classifications of unknown cases. These ambiguities can be compensated up to a large extent by using synthetic training data generated based on the individual covariance matrices of the categories. After all, the synthetic data provide an appropriate basis for training artificial neural network (ANN) as well, which actually require large or ‘Big’ data. Certainly, the utilization of real data for training would be preferable but for now they are not available up to a sufficient extent. For this, the utilization of synthetic training data taking into account correlations among element concentrations provides a valuable alternative approach.

A drawback of the ML algorithms tested in the present case study, as with ML models in general, is that the specific categorization criteria are not apparent. The ML models might be affected by the random initialization of starting parameters or by bias introduced by the training data. The validation of the ML models is rather an ongoing process than a concluded assessment. The extension and complementation of the pXRF data repository including new measurements might confirm and validate categories and it might explore new hitherto unknown categories as well. The ultimate scope will be to assemble an expert system for automated classification of transport amphorae in terms of origin of manufacture, which can be used for investigating trade routes in the Eastern Mediterranean.

Funding: not applicable

Conflicts of Interest: The author declares no conflicts of interest.

Ethics approval: not applicable

Consent to participate: not applicable

Consent for publication: not applicable

Data availability: All data are included as supplementary data

Code availability: The applied Python codes using the scikit learn and TensorFlow Libraries can be made accessible on request on GitHub

Authors contribution: AH as single author is responsible for the content and editing of the manuscript.

Allegretta I, Marangoni B, Manzari P, Porfido C, Terzano R, De Pascale O, Senesi G S (2020) Macro-classification of meteorites by portable energy dispersive X-ray fluorescence spectroscopy (pED-XRF), principal component analysis (PCA) and machine learning algorithms, Talanta 212: 120785. DOI: 10.1016/j.talanta.2020.120785
Andric V, Kvascev G, Cvetanovic M, Stojanovic S, Bacanin N, Gajic‑Kvascev M (2024) Deep learning assisted XRF spectra classification, Sci Reports 14: 3666. DOI: 10.1038/s41598-024-53988-z
Barone G, Mazzoleni P, Spagnolo GV, Raneri S (2019) Artificial neural network for the provenance study of archaeological ceramics using clay sediment database, J. Cult. Heritage 38: 147–157. DOI: 10.1016/j.culher.2019.02.004
Baxter MJ (2001) Statistical Modelling of Artefact Compositional Data, Archaeometry 43 (1) 131-147. DOI: 10.1111/1475-4754.00008
Baxter MJ, Freestone IC (2006) Log-ratio compositional data analysis in archaeometry, Archaeometry 48(3), 511–531. DOI:10.1111/j.1475-4754.2006.00270.x
Baxter MJ, Beardah CC, Papageorgiou I, Cau MA, Day PM, Kilikoglou V (2008) On statistical approaches to the study of ceramic artefacts using geochemical and petrographic data, Archaeometry 50 (1): 142 – 157. DOI: 10.1111/j.1475-4754.2007.00359.x
Brokamp C, Jandarov R, Rao M B, LeMasters G, Ryan P (2017) Exposure assessment models for elemental components of particulate matter in an urban environment: A comparison of regression and random forest approaches, Atmospheric Env. 151: 1-11. DOI: 10.1016/j.atmosenv.2016.11.066
Buxeda i Garrigos J (1999) Alteration and contamination of archaeological ceramics: the perturbation problem, J. Arch. Sci. 26: 295–313. DOI: 10.1006/jasc.1998.0390
Chollet F (2017) Deep Learning with Python. Manning Publications, Shelter Island.
Fiorucci M, Khoroshiltseva M, Pontil M, Traviglia A, Del Bue A, James S (2020) Machine Learning for Cultural Heritage: A Survey, Pattern Recognition Letters 133: 102–108. DOI: 10.1016/j.patrec.2020.02.017
Glascock MD, Neff H (2003) Neutron activation analysis and provenance research in archaeology, Meas. Sci. Tech. 14:1516–1526. DOI: 10.1088/0957-0233/14/9/304
Goren Y, Mommsen H, Klinger J (2011) Non-destructive provenance study of cuneiform tablets using portable X-ray fluorescence (pXRF), J. Arch. Sci. 38 (3): 684-696. DOI: 10.1016/j.jas.2010.10.020
Hazenfratz R, Munita CS, Neves EG (2017) Neural Networks (SOM) Applied to INAA Data of Chemical Elements in Archaeological Ceramics from Central Amazon, STAR: Sci. Tech. Arch. Res. 3: 334–340. DOI: 10.1080/20548923.2018.1470218
Hein A (2021) Revisiting the groups – Exploring the feasibility of portable EDXRF in provenance studies of transport amphorae in the Eastern Aegean. In M. Hegewisch, M. Dazkiewicz, G. Schneider (Eds.) Application of portable energy-dispersive X-ray fluorescence to the analysis of archaeological ceramics and glass. Topoi - Berlin Studies of the Ancient World, Berlin, pp. 43-61. DOI: 10.17171/3-75
Hein A, Tsolakidou A, Iliopoulos I, Mommsen H, Buxeda i Garrigόs J, Montana G, Kilikoglou V (2002) Standardisation of elemental analytical techniques applied to provenance studies of archaeological ceramics: An inter laboratory calibration study, The Analyst 127: 542-553. DOI: 10.1039/b109603f
Hein A, Georgopoulou V, Nodarou E, Kilikoglou V (2008) Koan amphorae from Halasarna – investigations in a Hellenistic amphora production centre, J. Arch. Sci. 35 (4): 1049-1061. DOI:10.1016/j.jas.2007.07.009
Hein A, Kilikoglou V (2012) ceraDAT – Prototype of a web-based relational database for archaeological ceramics, Archaeometry 54(2): 230–243. DOI: 10.1111/j.1475-4754.2011.00618.x
Hein A, Kilikoglou V (2017) Compositional variability of archaeological ceramics in the eastern Mediterraneanand implications for the design of provenance studies, J. Arch. Sci .Rep. 16: 564–572. DOI: 10.1016/j.jasrep.2017.03.020
Hein A, Kilikoglou V (2020) Ceramic Raw Materials: how to recognize them and locate the supply basins: chemistry, Arch. Anthropol. Sci. 12: 180. DOI: 10.1007/s12520-020-01129-8
Hein A, Dobosz A, Day PM, Kilikoglou V (2021) Portable ED-XRF as a tool for optimizing sampling strategy: The case study of a Hellenistic amphora assemblage from Paphos (Cyprus), J. Arch. Sci. 133: 105436. DOI: 10.1016/j.jas.2021.105436
Hein A, Kilikoglou V (2024) Categorization of archaeological ceramics based on their elemental composition using self organizing maps. In A. Hein (ed) Big Data in Archaeology -Proceedings of the 4^thCAA-GR 2021. N.C.S.R. “Demokritos”, Athens: 116-122.
Holmqvist E (2016) Handheld portable energy-dispersive X-ray fluorescence spectrometry (pXRF). In A. M. W. Hunt (ed) The Oxford Handbook of Archaeological Ceramic Analysis. Oxford University Press, Oxford, pp 363-381.
Hunt AMW, Speakman RJ (2015) Portable XRF analysis of archaeological sediments and ceramics, J. Arch. Sci. 53, 626-638. DOI: 10.1016/j.jas.2014.11.031
Käser S, Vazquez-Salazar LI, Meuwly M, Töpfer K (2023) Neural network potentials for chemistry: concepts, applications and prospects, Digital Discovery 1: 2,28–58. DOI: 10.1039/d2dd00102k
Kodikara GRL, McHenry LJ, Stanistreet IG, Stollhofen H, Njau JK, Toth N, Schick K (2024) Wide & deep learning for predicting relative mineral compositions of sediment cores solely based on XRF scans, a case study from Pleistocene Paleolake Olduvai, Tanzania, Artificial Int Geosci 5: 100088. DOI: 10.1016/j.aiig.2024.100088
Kohonen T (1995) Self-Organizing Maps, Springer-Verlag, Berlin. DOI:10.1007/978-3-642-97610-0.
Mommsen H, Beier T, Hein A (2002) A complete chemical grouping of the Berkeley neutron activation analysis on Mycenaean pottery, J. Arch. Sci. 29: 613–637. DOI:10.1006/jasc.2001.0759
Morgenstein M, Redmount CA (2005) Using portable energy dispersive X-ray fluorescence (EDXRF) analysis for on-site study of ceramic sherds at El Hibeh, Egypt, J. Arch. Sci. 32 (11): 1613-1623. DOI:10.1016/j.jas.2005.05.004
Papageorgiou I (2020) Ceramic investigation: how to perform statistical analyses, Arch. Anthropol. Sci. 12: 210. DOI: 10.1007/s12520-020-01142-x
Rieger L H, Wilson M, Vegge T, Flores E (2023) Understanding the patterns that neural networks learn from chemical spectra, Digital Discovery 2: 1957–1968. DOI: 101039/d3dd00203a
Ruschioni G, Malchiodi D, Zanaboni AM, Bonizzoni L (2023) Supervised learning algorithms as a tool for archaeology: Classification of ceramic samples described by chemical element concentrations, J. Arch. Sci. Reports 49: 103995. DOI: 10.1016/j.jasrep.2023.103995
Shugar AN, Drake, BL, Kelley G (2021) Rapid identification of wood species using XRF and neural network machine learning, Sci Reports 11: 17533. DOI: 101038/s41598-021-96850-2
Tite MS (2008) Ceramic Production, Provenance and Use – A Review, Archaeometry 50: 216–231. DOI: 10.1111/j.1475-4754.2008.00391.x
Wehrens R, Kruisselbrink J (2018) Flexible Self-Organizing Maps in kohonen 3.0, J. Stat. Software 87: 1-18. DOI: 10.18637/jss.v087.i07

Tables 1 to 6 are available in the Supplementary Files section.

No competing interests reported.

Download PDF

Reviewers agreed at journal
16 Dec, 2025
Reviewers agreed at journal
15 Dec, 2025
Reviewers invited by journal
12 Dec, 2025
Editor assigned by journal
10 Dec, 2025
Submission checks completed at journal
09 Dec, 2025
First submitted to journal
08 Dec, 2025

You are reading this latest preprint version

Machine Learning approaches in the evaluation of pXRF data in provenance studies of transport amphorae

Status:

Version 1

Abstract

Figures

1. Introduction

2. Data repository

3. Data evaluation and Machine Learning models

3.1 Multivariate statistics and formation of compositional categories

3.2 Self Organizing Maps (SOM)

3.3 Support Vector Machines (SVM)

3.4 Random Forest (RF)

3.5 Artificial Neural Network

4. Results and Discussion

Concluding Remarks

Declarations

References

Tables

Additional Declarations

Supplementary Files

Status:

Version 1