4.1. Unsupervised Machine Learning
In an initial case study the dataset Paphos_AMPH had been tested in an SOM model comprising 366 pXRF measurements of 288 amphora fragments from Paphos (Hein and Kilikoglou 2024). In order to conform to the evaluation of the same dataset with conventional statistics, which had been published before (Hein et al. 2021), the number of attributes was reduced to fifteen disregarding U. For normalization of the dataset additive logratio transformation (alr) was applied on the element concentrations with the Fe concentrations as common divisor. A toroidal 10´10 base map with hexagonal topology was defined. The codebook vectors were initialized with random values and then stepwise trained considering the squared Euclidian distance as distance criterion. Based on the hierarchical clustering of the resulting codebook vectors 20 clusters were defined corresponding to the number of categories expected based on the evaluation of NAA data of a part of the samples. The SOM clusters not only confirmed the clustering of the same dataset using conventional exploratory statistics (Hein et al. 2021) but allowed for subdividing some of the original clusters providing thus a higher detailedness in defining categories at a similar level as the NAA data. A drawback of this initial case study of unsupervised machine learning, though, was that due to the random initialization of the codebook vectors the resulting SOM was not necessarily reproducible and particular cases appeared to be ambivalent in terms of categorization.
4.2. Supervised Machine Learning
The supervised machine learning methods were tested first with the dataset Koan_AMPH. The compositional categories in this assemblage have been investigated in previous studies, in which 75 of the Hellenistic amphora fragments have been analysed with NAA allowing for assigning compositional patterns to specific origins of production (Hein et al. 2008). Seven categories were defined in a training dataset with 201 cases of pXRF measurements of 58 fragments from Halasarna and Kos Town. These categories represent production places in South Central Kos, in the Northwest of the island and potentially also in the opposite coast of Asia Minor. For normalization the data were center logratio transformed.
In a first approach supervised machine learning was tested with SVM and RF models based on the seven Koan amphora categories (Table 1). For a series of SVM models different kernels were tested - linear, polynomial and gaussian – optimizing the regulation parameter C. For the RF models, on the other hand, the different criteria ‘gini’, ‘entropy’ and ‘log_loss’ were tested with predefined maximum depths of the trees between four and six. The present RF model is based on the ‘log_loss’ criterion and a maximum depth of six and it was tested on normalized and non-normalized compositional data. The default number of trees in the forest was 100 and the maximum number of features to be considered in an individual tree was eight. The dataset of 201 cases was divided in 140 cases for training (70%) and 61 for testing (30%). Both algorithms, SVM with polynomial or gaussian kernels as well as RF with normalized data, provided a factual accuracy, weighted precision and weighted recall of 100.0 % taking into account true/false positive classifications as well as true/false negative classifications (Table 3). The classification of the cases in the Koan_AMPH dataset indicated a general agreement of the SVM and RF models (Figure 1 a and b). The main divergence concerned the classification of cases to categories Kos-A and Kos-D (Figure 1 c). The SVM model apparently classified cases not included in the training data rather to Kos-A while the RF model classified them to Kos-D. Apart from this, though, it has to be considered that cases can only be classified according to the predefined categories in the training data and thus a considerable number of false positive classifications has to be taken into consideration, eventually representing imports from other islands or the coast of Asia Minor. For this, the training dataset was extended with seven categories of Rhodian amphorae, which had been found in Rhodes Town. Some of these Rhodian amphorae, though, had been supposedly manufactured in the Rhodian Peraia in the Western part of the Knidian Peninsula. Furthermore, a series of nine other supposedly Aegean amphora categories was included in the extended training dataset, which apart from the seven Samian amphorae, which were analysed actually in Samos, were found in the Paphos Agora. The dataset of 460 cases with known NAA categories of 145 amphora fragments was divided in 322 cases for training and 138 cases for testing. The accuracy of both SVM and RF models was still at 93.5% to 98.6 % indicating feasible classification and clear distinction among pottery productions on the Eastern Aegean islands and the coast of Asia Minor (Figure 1 d and e) (Table 3). The lower accuracy of the RF model of the normalized data was mainly related to the Sam-A category, which could not be predicted in the model. Expectedly, most of the cases in the Koan_AMPH dataset were classified according to the initial seven Koan categories. Only a small number of cases was classified according to Rhodian amphora categories, which, however, assumedly represent origins of manufacture in Asia Minor or the Knidian Peninsula, such as Rho-B, Rho-E and Kni-A as well as a number of cases classified as Nik-A, which is a hitherto non-localized category (Figure 1 f). The comparison of SVM and RF models, furthermore, indicates ambiguous classifications of the Koan categories with Rho-E and Kni-A, which presumably was the result of the small number of cases representing the latter categories in the training dataset. Eventually, ten further categories were included in the training data set representing origins of manufacture in the Western Mediterranean and Cyprus with 637 cases representing 188 individual fragments, so that the data sets for training and testing were extended to 445 and 192, respectively. The test data still indicated a sufficient accuracy of 92.7% to 93.8 % of both the SVM and the RF models (Figure 1 g and h) (Table 3). The models were tested for their prediction potential by predicting categories in the Paphos_AMPH dataset (Figure 1 i). The classifications of the two models indicated agreement in categorization of some of the larger groups, such as Kos-A, Kos-D, Kni-A, Chi-A, Eph-A, Nik-A, Tha-A and Sic-A, while the categorization of the Cypriot amphora groups (Pap-# and Kou-#) appears to be rather ambiguous.
The small number of cases representing some of the categories as well as the implicit imbalance of case numbers in the training dataset appeared to be an issue affecting the accuracy of the classification of new data. For this, the generation of synthetic data based on the existing training data and their estimated compositional variation was tested. Figure 2 presents the Ti and V concentrations of training dataset cases representing three of the Koan amphora categories. Based on these concentrations synthetic datasets were generated including each 1000 cases applying three different approaches. The data in Figure 2b are based on random mixtures of cases representing the respective categories (Barone et al. 2019) while the data in Figure 2c are random concentrations based on the normal distributions estimated for the individual element concentrations presenting a more realistic variation. The apparent correlation of the element concentrations, though, appears to be effectively considered only if the synthetic data are randomly generated based on the covariance matrix (Figure 2d). For this, a synthetic training dataset was generated based on the covariance matrices of the original categories. As for the determination of a covariance matrix the number of cases should be at least at the number of attributes, in the present case 16, for the categories with insufficient numbers of cases artificial new cases were created by introducing random fuzziness based on the individual standard deviations. Furthermore, in the case of negative modeled concentration values these were replaced with random values between 0 and the minimum values of the original data in order to allow for clr normalization.
The synthetic training dataset was divided in 23100 cases for training and 9900 cases for testing. While the accuracy of the SVM models was increased up to 99.9 % the RF models presented some inaccuracies concerning synthetic data generated based on categories with a small initial number of cases. The comparison of the classification based on the SVM model with the classification based on the RF model did not improve in terms of agreement most probably due to the problems with the RF model.
Eventually, the increased number of cases in the generated synthetic training dataset allowed for testing machine learning using a sequential ANN model. The ANN model was tested with normalized as well as non-normalized data with additional normalization of the input layer. The data were forwarded to two hidden layers, both with Rectified Linear Unit (‘relu’) activation. For the first hidden layer unit numbers of 64, 128 and 256 were tested, while the second hidden layer comprised 32 units. From here the data were forwarded to the output layer, in which the units corresponding to the predefined categories were activated with the ‘softmax’ function providing an estimation of the most probable category. First, the ANN was trained with the synthetic training data set of the 23 categories of Eastern Aegean transport amphorae tested already above with the SVM and RF models. For compiling the present ANN model ‘sparse_categorical_crossentropy’ was selected as loss function and the ‘Adam’ optimizer with a learning rate of 0.00001. The model was fitted in 1000 steps using like in the previous ML models 70% of the dataset for training and 30% for testing. The final accuracy of the training was indicated with 99.7% to 99.9% (Table 4). Figure 3 presents a heatmap of the Koan_AMPH dataset classifications according to the ANN model with the normalized input layer and 256 units in the first hidden layer with the above reported classification according to a SVM model with a gaussian kernel (‘rbf’) (Figure 3, left). There is a general agreement concerning the Koan amphora categories apart from a few cases of the main category Kos-A classified by the SVM model and identified as Kos-D/Kos-D2 or even as imports from Asia Minor and the Knidian Peninsula (Rho-E and Knidos) by the ANN model or cases classified as Category Kos-D by the SVM model and Kos-A or Rho-E by the ANN model. Apart from this, it can be noted that there appears to be an agreement between the two ML models concerning individual cases representing origin of manufacture from Chios and Ephesos as well as the Nikandros ware. For assessing the potential of the ML models predicting categories the predicted categories of stamped amphora handles appear to be of particular interest as they represent specific amphora workshops. Table 5 lists the predictions of categories according to the different ML models of four stamped amphora handles excavated in South-Central Kos, and two Hellenistic as well as five Roman Koan amphora handles from the collection of the National Museum in Athens. While the four stamped amphora handles from South-Central Kos can be clearly assigned to the main category from this area Kos-A the predictions for the two Hellenistic amphora handles from Attica appears to be ambiguous. Three of the four measurements of Sample ATH-H-17-01 indicate the Category Kos-C. The fourth measurement indicate Kos-C only based on some of the ANN models but Tha-A based on the SVM models. In the case of Sample ATH-H-17-02, on the other hand, there appears to be a confusion between Categories Kos-C and Kos-D2, which eventually might indicate another Koan category, which has not been included in the dataset. Concerning the Roman amphora handles, two of them, ATH-R-25-02 and ATH-R-25-04, can be categorized as Kos-A and Kos-D, respectively, based on the ML models. ATH-R-25-05 indicates more or less clearly Category Rho-B, which probably can be localized in Knidos. The remaining two handles present rather unclear categorization, which might be related to missing training categories as well.
Following the evaluation of the Koan_AMPH dataset a larger ANN model was compiled with the full synthetic training dataset representing 33 predefined categories in order to evaluate the Paphos_AMPH dataset (Table 1). The ANN model was tested again with a normalized input layer and a first hidden layer comprising either 64, 128 or 256 units, respectively. The accuracy of the ANN models was between 99.4% and 99.8 %. Figure 3 (right) presents the categorization of the Paphos_AMPH dataset comparing again the SVM model with a gaussian kernel and the ANN model with 256 units in the first hidden layer. The two models provide a largely concordant categorization of the dataset with only a few categories missing, such as Kos-E, Rho-C and Sam-A in the ANN model and Rho-C missing in the SVM model. There appears to be again a slight uncertainty concerning the Category Kos-D of the SVM model, out of which a part of the cases is assigned to different Aegean categories based on the ANN model. The Cypriot amphora categories are clearly separated apart from the cases in Category Pap-C in the ANN model, which appear in different mainly Rhodian categories in the SVM model. This might be related to a similar geological context. Table 6 lists the predictions of categories according to the different ML models of eleven stamped amphora handles excavated in South-Western Cyprus and of four fragments of basked handled amphorae, a specific amphora type assigned to the Eastern Mediterranean. The two stamped Kourion handles can be clearly categorized as Kou-C and Kou-D, respectively. On the other hand, only one of the two stamped Knidian handles is categorized as Kni-A, while the other indicates rather Chian origin. The stamped Ephesian amphora handle indicates an unclear categorization, which might be again due to a missing category from the coast of Asia Minor in the training data. A potential Ephesian origin, however, is indicated for one of the Thassian stamped amphora handles, while the other two are clearly categorized as Tha-A. An ambiguous categorization is eventually indicated for the three Sinopian stamped handles. However, in this case it has to be considered that amphora categories from the Black Sea are hitherto missing in the training data. Missing training data, such as from Cilicia or the Syro-Palestine region, might be considered as well in view of the rather unclear categorization of the basket handled amphorae. In one case even a Western Mediterranean origin is indicated, which, though, most probably can be considered as false positive categorization.