A Data Mining Perspective of XRF Elemental Analysis from Pueblo People’s Pottery

Received: October 10, 2019 Accepted: January 20, 2020 Published online: February 28, 2020 Hierarchichal clustering was used to identify elemental signatures in artifacts attributed to the Pueblo peoples. The artifacts in this study are pottery samples found at different sites in the state of New Mexico, USA. Three methods were applied: complete, average, and Ward. Their corresponding cophenetic correlation coefficients were used to contrast the three methods. Elemental characterization was only based on X-ray fluorescence excitation from a portable spectrometer with silver anode. The elemental correlations here disclosed by data mining techniques are expected to guide further archaelogical studies and assist experts in the assessment of provenance and historical ethnographic studies.


Introduction
Pottery, especially concerning ancient cultures, has an empirical basis which makes its study a complex task. Pottery achieves its characteristics through the manufacturing process, in addition to the raw materials that give it origin. It is perhaps one of the oldest incursions of early humans [2,9] in the field of what is today called materials science. Pottery can provide a window into the geographical location of the makers, their technological advances, their trading practices, and even-without aforethought design-about the evolution of Earth's magnetic field [1].
Typically, provenance of archaeological artifacts is established by means of a well characterized sample-however, such a vantage point may not be always available to the researcher. The elemental composition and other physical and mineralogical as well as morphological factors are well known and compared against those of unknown artifacts [27]. In the case of pottery, the composition of clay beds is also compared to that of pottery samples [18], but clay is a complex material with ample geographical distributions and variations [22,26]. An elemental composition match is a crucial factor in geographical provenance, but the lack of a match may not suffice to discard geographical provenance. To amplify the complexity of asserting provenance of pottery, details of firing temperature can alter both chemical and mineralogical make up. Likewise, the addition of temper will affect chemically the clay [18]. Determination of provenance consists not only of establishing geographical origin, but it extends to coordinates of time and ethnic origin. Information about the first two can be mostly gained from 1 multiple analytical techniques [2,3,9,27], an example of which is X-ray fluorescence (XRF) analysis [4,8,15,16].
The present study deals with pottery sherds (Figure 1) found at four different sites in New Mexico, USA. The localities are presumed to have been inhabited by the Pueblo peoples, but also by other ethnic groups [5,6]. No other analytical techniques were employed and no sample(s) of determined provenance has been available to compare the findings of this study. This study is meant pp.130 to display preliminary results that will help to classify the multiple samples collected. At this time we have not intended to create an exhaustive treatment applying all clustering techniques but rather to create an exploratory guidance using statistical techniques.

Sample Description and Methods
A total of 12 samples from the Pueblo peoples were analyzed. The samples originate from four locations in New Mexico: Three Rivers, Mimbres, Chupadero, and an unknown location. Their average size is about 3 4 cm 2 and thicknesses of some 3 to 4 mm. They were not processed in any way. Eight of the samples were probed on the back and front, and the rest only studied on the front or back as indicated, Table 1. We used a portable X-ray excitation source, XMET 300TX from Oxford Instruments. The anode material is Ag, and the operational parameters were 40 kV and 7 µA for all samples. This instrument has an energy dispersive (EDX) detector of the SDD type and the spectra collected can be displayed using a 2048 multichannel analyzer (MCA) integrated in the portable X-ray source. For energy calibration we used the Ag K-alpha and the Fe K-alpha lines. The portable instrument was placed in close contact with the samples from which only conspicuous dust was removed before irradiating for 300 s each sample. All spectra were analyzed by means of PyMCA, an open source, publicly available software developed at the European Radiation Facility [21]. Data reduction of the spectra yielded a collection of intensities that constitute a multidimensional elemental space-20 elements in total. The intensities were then processed through clustering analysis as well as principal component analysis (PCA). Data treatment was accomplished by scripts developed in Python: sorting of PyMCA data, clustering, and PCA. Figure 1 shows the fluorescence spectrum corresponding to sample 0 (10532-O_back_3549). The spectrum is shown in logarithmic scale to highlight the quality of the fit. A blue line (continuous line) represents the raw spectrum, the continuum background is colored gray (vertical line), and the fit is the green line (cross). The Compton portion of the spectrum is not included in the fit but it is inconsequential to the analysis since all elemental peaks included in the fit are well outside the Compton region at about 21 keV. All fits were of the similar quality but they are not included in the manuscript.

Results and Discussion
The most prominent spectral contribution is that of iron. High concentrations of iron are not uncommon in pottery artifacts [1,24,25]. The presence of Fe may be structural, since Fe acts substitutionally for major elements like Al or Mg, but it could also be present when added as temper or in the form of a pigment [24].
Elemental fits of the spectra were organized for analysis (Supplementary material Tables S2-S4). The values are the integrated counts per 300 s. A scree plot of the eigenvalues of PCA helped us identify the number of factors that we could retain to describe data dispersion. It would thus appear that two principal components would describe the overall data tendencies ( Figure 2). Figure 2 was created after removing the elemental column corresponding to Fe. We have done that to better observe the elemental compositions that could guide classification of the samples: Sr, Zr, Ca, and Rb. Elemental concentrations have not been calibrated but that event is not relevant to the present treatment because a correspondence between counts and concentrations can always be established. The table constitutes the elemental data matrix processed by hierarchical clustering techniques as well as PCA. In the hierarchical clustering analysis-the main technique employed-three methods were applied: complete, average, and Ward. In each case the same distance matrix was used as data precursor.  We are interested in finding correlations among the samples based, at this time, solely on their elemental signatures. Clustering analysis [12,15,23] is a convenient approach to extracting correlations out of seemingly disconnected data. The data is simultaneously expected to exhibit a connectivity pattern that would arise from a metric and a clustering method introduced to sort the data. At the outset, and for the sake of being systematic, the same Euclidean metric was used to construct the data matrix which subsequently was clustered by three other methods.
Supplementary Tables S2-S4 served to classify the samples as a function of their elemental composition. Details of the manufacturing protocol of the pottery samples are unknown. With that idea in mind we decided to not only analyze the whole elemental composition of the sample, but to additionally create two elemental subgroups: clay and complementary. Thus we artificially created three data sets. In an initial step all elements identified were processed together. Subsequently generic elements typically J. Nucl. Phys. Mat. Sci. Rad. A. Vol. 7, No. 2, Feb. 2020 pp.132 found in clay [14,22] were processed alone: Fe, Ca, K, Ti, Cu, Mn, and Zn. The rest of elements, here called complementary, were also processed as an independent data set. The latter could be considered to have been added by their manufacturing process, perhaps temper, glazing or pigments, and/or through weatherizing throughout time and possibly handling.
Dendrograms are used to depict clustering. The hierarchical clustering algorithm proceeds by merging smaller clusters into larger ones and also splitting them, based on a distance criterion. In our case the factors are distances between vectors of elemental content. The samples have no visible decorations on them and pigmentation of that nature could not be gauged unequivocally ( Figure  1). We included symbols in all figures corresponding to dendrograms. They facilitate identification of groups with the dashed line establishing a grouping boundary. Complete-linkage method yielded the dendrograms in Figure 4. The entire data set was clustered into four well defined groups (Figure 4a). Clay elemental composition clustered the samples into five groups (Figure 4b).
Complementary elements were also clustered into five groups ( Figure 5). The average-linkage method applied to the entire data set yielded four groups (Figure 6a) and that applied to elemental clay yielded five groups (Figure 6b). The analysis of the complementary-element data set clustered the samples into five groups (Figure 7). The dendrograms generated using the third method, Ward, based on all elements, the clay elemental content, and the complementary-element set are given in Figures 8a, 8b, and 9, respectively. Each clustering method was designed with some classification capability in mind [13]. However, the adequacy of the method is commensurate to the geometrical distribution of the data and the type of information sought. Possibly, if the data has a two-dimensional representation, a clustering method may be readily selected, but that is not so in the present case. Let us highlight once again that we are only searching for a statistically educated sense of perspective to guide future research decisions.   An objective method was applied to quantify the degree of faithfulness of the pairwise distances that resulted from the clustering process in respect to the initial data matrix. Thus a cophenetic correlation coefficient [7,20] was calculated for each data set and clustering methodology ( Table 2). It serves as a qualification that helps us reduce the possibility of focusing on random effects driven intrinsically by the methodology itself.

Summary
Pottery sherds were collected from three known locations in New Mexico and exposed to Ag X-ray radiation to extract elemental signatures from fluorescent excitation. Subsequently, elemental data were organized into a matrix where each sample has elemental variables that were hierarchically clustered using three methods. We did not mean to impose the use of any particular method or to preselect certain sample groups or numbers of groups during the treatment. For that reason we employed hierarchical pp.134 clustering methods that have relatively simple algorithms. The application of multivariate data mining methods, clustering, and PCA has been of a merely exploratory nature to numerically select elemental signatures for further investigation. Also, we have sought to assist reclassification of some of the samples. Both objectives have been aided by the numerical perspective provided by clustering methods. Front and back of most samples were analyzed, with the dual intent of enhancing the objectivity of clustering and exploring whether clustering could be indicative of the manufacturing process of pottery making. Depending on the manufacturing process the elemental concentrations of front and back would be closely related. However that need not be the case, because the manufacturing process is not known and need not be the same on both sides of the sherd. Additionally, other factors, such as pigmentation, could create a factor of elemental discrepancy. Decorative pigmentation is not visually conspicuous on the samples here analyzed, however.
A favorable correspondence was found between the cophenetic correlation coefficient and features already known about the samples, which were highlighted by the average-linkage method. The average-linkage method, which has the highest cophenetic correlation coefficient, reflects the best correspondence with the geographical location known-data element 12 would have to be set aside. Such a conclusion is suggested only by the complementary elemental composition (Figure 7). The same dendrogram would also suggest a reclassification of samples 6 to 11, which are of unknown geographical origin. Notice that this vantage point only arose after splitting of the data set into two: clay and complementary elemental components. We are aware that the validity of the cophenetic correlation coefficient has been questioned [7,17,19]. Nevertheless, we used it because it is frequently applied in data mining techniques and clustering applications and it is of simple implementation.
As it has already been noted, average-linkage does not match front and back of each sample, which is of no concern. All clustering methods here applied yielded the same generic conclusion of front and back elemental mismatch. Invariably, all clustering methods paired together the front and back of samples (0,1), (2,3), and (4,5). It may be inferred that there are peculiarities about their manufacturing process, and detailed analytical investigations by other techniques should follow.
As suggested by the PCA results and to narrow down the scope of analysis and focus the analytical efforts, Fe, Sr, Zr, Ca, and Rb should be analyzed in future studies of the same samples. Kuleff and Djingova [10] have listed the former elements among the most important 23 elements in determining provenance of pottery. Future studies should also extend the elemental range of characterization below the K K-alpha line, which was not possible with the excitation source available at present. Elemental content and its depth distribution will be investigated by X-ray photoelectron spectroscopy and will also be cluster-analyzed on its own to compare with the present findings. In the future we will also collect X-ray diffraction data, which is expected to shed more light about the elemental variations of Fe, Sr, Zr, Ca, and Rb identified by PCA.