Since January 2020, BIPEA has organized two regular proficiency tests per year for these analyses. The results of the four most recent tests, since Salmonella was added to the samples, are summarized in Table 1 (Cronobacter detection) and Table 2 (Salmonella detection).
Table 1
Summary of Cronobacter detection results for four trials. Unacceptable results are in indicated in italics.
| | Sample 1 | Sample 2 | Sample 3 |
|---|
Trial 1 | Contamination scheme | Not spiked | Spiked | Spiked |
Laboratory results | Detected: 0 Not detected: 21 | Detected: 21 Not detected: 0 | Detected: 21 Not detected: 0 |
Trial 2 | Contamination scheme | Spiked | Not spiked | Not spiked |
Laboratory results | Detected: 19 Not detected: 0 | Detected: 0 Not detected: 18 | Detected: 0 Not detected: 19 |
Trial 3 | Contamination scheme | Not spiked | Not spiked | Spiked |
Laboratory results | Detected: 1 Not detected: 19 | Detected: 2 Not detected: 18 | Detected: 17 Not detected: 3 |
Trial 4 | Contamination scheme | Spiked | Not spiked | Not spiked |
Laboratory results | Detected: 20 Not detected: 0 | Detected: 1 Not detected: 17 | Detected: 0 Not detected: 19 |
Table 2
Summary of Salmonella detection results for four trials. Unacceptable results are in indicated in italics.
| | Sample 1 | Sample 2 | Sample 3 |
|---|
Trial 1 | Contamination scheme | Not spiked | Spiked | Not spiked |
Laboratory results | Detected: 0 Not detected: 6 | Detected: 6 Not detected: 0 | Detected: 0 Not detected: 6 |
Trial 2 | Contamination scheme | Spiked | Spiked | Not spiked |
Laboratory results | Detected: 15 Not detected: 0 | Detected: 14 Not detected: 0 | Detected: 1 Not detected: 14 |
Trial 3 | Contamination scheme | Not spiked | Not spiked | Spiked |
Laboratory results | Detected: 0 Not detected: 15 | Detected: 1 Not detected: 14 | Detected: 14 Not detected: 1 |
Trial 4 | Contamination scheme | Spiked | Spiked | Not spiked |
Laboratory results | Detected: 16 Not detected: 1 | Detected: 18 Not detected: 0 | Detected: 0 Not detected: 18 |
For both pathogens, results are generally very satisfactory, regardless of the geographical origin of the laboratories. For 16 of the 24 sets of samples studied here, all laboratories concluded correctly, and for the remaining samples only between 7 and 15% of laboratories reported false negatives or false positives. Approximately 75% of laboratories used the reference method ISO 22964 [6] for Cronobacter detection and approximately 50% of laboratories used the reference method ISO 6579-1 [7] for Salmonella detection; no noticeable effect of the method utilized can be observed on the results. In addition, performance in these tests has remained relatively stable over time, demonstrating that most laboratories master these detection analyses. It is also important to note that the rates of false positives are either similar, in the case of Salmonella, or greater, in the case of Cronobacter, than the rates of false negatives. This is reassuring, as the consequences of false positives are primarily economic, such as unnecessary product recalls, while false negatives can lead to outbreaks and have serious impacts on public health, including the death of contaminated persons.
In recent years, several propositions have been published for numerical scoring systems, designed to allow for easy interpretation of participant performances, for qualitative proficiency testing data. The objective of these systems, which include the L-score [11], the a-score [12], and the S-score [13], is to mimic the widely accepted z-score used for quantitative data and give participants a simpler way to evaluate their results.
Each of these systems is a useful contribution to the assessment of qualitative PT data, making it easier to compare between tests and examine laboratory performance over time. However, each also has certain limitations. The L-score requires at least 10 participants, five different parameters where failure has been recorded, and specific statistical modeling software; in addition, it is fundamentally a relative evaluation rather than an absolute one, as the most satisfactory scores are impossible for a laboratory to achieve unless other laboratories perform poorly. For this reason, identical results can be judged differently on different tests, making continued assessment over time complicated. The a-score remedies several of these difficulties but needs a minimum of 20 participants to be implemented. The S-score removes this barrier but uses a more complex system that requires PT providers to define a priori the difficulty of each analysis, which leads to numerical scores with less transparent interpretations when compared with the simplicity of the z-score. Replicates are also necessary in some cases.
BIPEA applies the specificity and sensitivity indicated in ISO 22117 and considers the use of relative specificity, sensitivity, and accuracy to be the system best adapted to evaluating laboratory performance for its qualitative proficiency tests [14]. The relative accuracy is an easy-to-interpret assessment of the overall ability of the laboratory to complete the analyses studied, while relative specificity and sensitivity allow the essential distinction to be made between difficulty detecting positive samples and incorrectly identifying negative samples, which are errors necessitating significantly different corrective actions—just as large positive and large negative z-scores clearly indicate different kinds of analytical problems. In addition, if a laboratory participates in multiple rounds of such a PT, it is simple enough to monitor evolution in performance by graphing relative accuracy against time, as described by Chabirand et al. [15].
By examining in detail the results of the four trials previously presented for the detection of Cronobacter (Table 3) and Salmonella (Table 4) in milk powder, each laboratory’s global performance can be evaluated using these assessment parameters. For each pathogen, all but three laboratories achieved relative accuracy scores of 100%, and therefore 100% relative specificity and sensitivity as well. The overall performance on these tests can thus be considered highly satisfactory. Participants in this program are provided with their relative specificity, sensitivity, and accuracy for each trial for which they submit results, and can easily calculate these three scores over a period of multiple trials if desired, as demonstrated here.
Table 3
Detailed results of four trials for the detection of Cronobacter in milk powder, including evaluation and rate of participation (rP) for each laboratory. The contamination scheme is displayed in the table header, 0 and 1 correspond to “Non detected” and “Detected” respectively, and unacceptable results are in indicated in italics.
| | Trial 1 | Trial 2 | Trial 3 | Trial 4 | Evaluation |
|---|
Lab | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | rSP (%) | rSE (%) | rAC (%) | rP (%) |
|---|
1 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 100 | 100 | 100 | 100 |
2 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 86 | 80 | 83 | 100 |
3 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 100 | 100 | 100 | 100 |
4 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 100 | 100 | 100 | 100 |
5 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 100 | 100 | 100 | 100 |
6 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 86 | 80 | 83 | 100 |
7 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 100 | 100 | 100 | 100 |
8 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 100 | 100 | 100 | 100 |
9 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 100 | 100 | 100 | 100 |
10 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | | 0 | 100 | 100 | 100 | 92 |
11 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | | | 100 | 100 | 100 | 83 |
12 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | | | | 100 | 100 | 100 | 75 |
13 | | | | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 100 | 100 | 100 | 75 |
14 | | | | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 100 | 100 | 100 | 75 |
15 | | | | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 100 | 100 | 100 | 75 |
16 | | | | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 100 | 100 | 100 | 75 |
17 | | | | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 67 | 67 | 67 | 75 |
18 | 0 | 1 | 1 | 1 | | 0 | 0 | 0 | 1 | | | | 100 | 100 | 100 | 67 |
19 | 0 | 1 | 1 | | | | | | | 1 | 0 | 0 | 100 | 100 | 100 | 50 |
20 | 0 | 1 | 1 | | | | | | | 1 | 0 | 0 | 100 | 100 | 100 | 50 |
21 | | | | 1 | 0 | 0 | 0 | 0 | 1 | | | | 100 | 100 | 100 | 50 |
22 | 0 | 1 | 1 | | | | | | | | | | 100 | 100 | 100 | 25 |
23 | 0 | 1 | 1 | | | | | | | | | | 100 | 100 | 100 | 25 |
24 | 0 | 1 | 1 | | | | | | | | | | 100 | 100 | 100 | 25 |
25 | 0 | 1 | 1 | | | | | | | | | | 100 | 100 | 100 | 25 |
26 | 0 | 1 | 1 | | | | | | | | | | 100 | 100 | 100 | 25 |
27 | 0 | 1 | 1 | | | | | | | | | | 100 | 100 | 100 | 25 |
28 | | | | | | | 0 | 0 | 1 | | | | 100 | 100 | 100 | 25 |
29 | | | | | | | | | | 1 | 0 | 0 | 100 | 100 | 100 | 25 |
30 | | | | | | | | | | 1 | 0 | 0 | 100 | 100 | 100 | 25 |
Table 4
Detailed results of four trials for the detection of Salmonella in milk powder, including evaluation and rate of participation (rP) for each laboratory. The contamination scheme is displayed in the table header, 0 and 1 correspond to “Non detected” and “Detected” respectively, and unacceptable results are in indicated in italics.
| | Trial 1 | Trial 2 | Trial 3 | Trial 4 | Evaluation |
|---|
Lab | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | rSP (%) | rSE (%) | rAC (%) | rP (%) |
|---|
1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 100 | 100 | 100 | 100 |
2 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | | 1 | 0 | 100 | 100 | 100 | 92 |
3 | | | | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 75 | 80 | 78 | 75 |
4 | | | | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 75 | 100 | 89 | 75 |
5 | 0 | 1 | 0 | 1 | 1 | 0 | | | | 1 | 1 | 0 | 100 | 100 | 100 | 75 |
6 | | | | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 100 | 100 | 100 | 75 |
7 | | | | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 100 | 100 | 100 | 75 |
8 | | | | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 100 | 100 | 100 | 75 |
9 | | | | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 100 | 100 | 100 | 75 |
10 | | | | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 100 | 100 | 100 | 75 |
11 | | | | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 100 | 100 | 100 | 75 |
12 | | | | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 100 | 100 | 100 | 75 |
13 | 0 | 1 | 0 | 1 | | 0 | 0 | 0 | 1 | | | | 100 | 100 | 100 | 67 |
14 | | | | 1 | 1 | 0 | | | | 1 | 1 | 0 | 100 | 100 | 100 | 50 |
15 | 0 | 1 | 0 | | | | | | | 1 | 1 | 0 | 100 | 100 | 100 | 50 |
16 | | | | 1 | 1 | 0 | 0 | 0 | 1 | | | | 100 | 100 | 100 | 50 |
17 | | | | | | | | | | 0 | 1 | 0 | 100 | 50 | 67 | 25 |
18 | 0 | 1 | 0 | | | | | | | | | | 100 | 100 | 100 | 25 |
19 | | | | | | | 0 | 0 | 1 | | | | 100 | 100 | 100 | 25 |
20 | | | | | | | 0 | 0 | 1 | | | | 100 | 100 | 100 | 25 |
21 | | | | | | | | | | 1 | 1 | 0 | 100 | 100 | 100 | 25 |
22 | | | | | | | | | | 1 | 1 | 0 | 100 | 100 | 100 | 25 |
23 | | | | | | | | | | 1 | 1 | 0 | 100 | 100 | 100 | 25 |
If one of the primary goals of proficiency testing is to enable laboratories to demonstrate their competence for given analyses, there is a final factor to be considered. While it is clear that a laboratory with 100% relative accuracy has demonstrated greater ability than one with 33% relative accuracy, and that a laboratory that consistently achieves scores of 100% masters the analyses to a greater extent than one that oscillates between scores of 100% and 50%, frequency of participation must also be taken into account. For example, by studying multiple trials collectively as in Tables 3 and 4, it is possible to observe for each participant not only the rates of relative specificity, sensitivity, and accuracy, but also a final rate, the rate of participation (rP), which corresponds to the number of samples analyzed divided by the total number of samples proposed over a given time period. For an analytical laboratory, achieving a relative accuracy of 100% while participating in all four trials can be a way to signal greater expertise than obtaining the same rate while participating in a single test, and such performance can be extremely valuable for earning and maintaining consumer trust.