BACTERIAL ISOLATE CHARACTERISATION
Genomic characteristics of the Escherichia coli and Clostridium perfringens
The quality of the E. coli and C. perfringens assemblies (number of contigs, contig length, coverage) are in Table S1. Flye generated assemblies with 25.9 contigs for the 26 E. coli isolates (on average, SD = 13.1), and 26 contigs for the 9 C. perfringens isolates (average, SD = 30.5). The average E. coli genome size was 5.57 Mbp (SD = 253,737 bp), and 3.46 Mbp (SD = 84,034 bp) for C. perfringens. The coverage was higher for the C. perfringens compared to the E. coli (546× and 253× respectively). Species assignments by KmerFinder confirmed all isolates as E. coli or C. perfringens (Table S2). More detailed genomic characterisation of the E. coli and C. perfringens isolates is presented in Table S3.
Phylogenetic relationship among the Escherichia coli and Clostridium perfringens isolates
To study the phylogenetic relationships among E. coli and C. perfringens isolates, maximum-likelihood (ML) trees were constructed based on core-genome alignments and single nucleotide polymorphisms (SNPs) (Figure S2A). The tree topologies derived from core-genome and SNPs were highly similar in E. coli, while for C. perfringens the SNP-based phylogeny was not consistent with MLST assignment and grouped the isolates with lower confidence (bootstrap support values < 70).
Core-genome and SNP-based phylogenies revealed several well-supported E. coli clusters (bootstrap > 83). Isolates from different pig microbiomes occasionally showed close relatedness and shared same MLST (Figure S2A), e.g., DTU 52/DTU 56 (ST224, 462 SNPs) and LVK 94/LVK 97 (ST100, 711 SNPs). Within-microbiome isolates such as LVK 60/DTU 60 (ST641) and LVK 83/DTU 83 (ST48) showed variable SNP differences (845 and 120, respectively), indicating different strain-level diversities from the same microbiome (Figure S2; Table S4).
C. perfringens isolates showed more phylogenetic divergence with weaker support (bootstrap < 70), reflecting greater genomic variability (Figure S1; Table S4), e.g., CP 61 and CP 83 showed the highest divergence and were the most distant from the rest of the isolates. The majority of the remaining isolates were more closely related (CP 94, CP 60, CP 43 and CP 11) and belonged to ST850. The remaining isolates were either of undetermined ST or more distant (Figure S1; Tables S3 & S4).
Escherichia coli pathotyping and virulence factors detection
Across all E. coli isolates, three pathotypes were detected and 466 virulence genes were identified (Figure S2B; Table S5), primarily associated with adhesion and secretion systems. Fourteen isolates were identified as ETEC, three as EPEC, three as STEC/EHEC, and the remaining were undetermined. Mixed pathotypes were also detected in the same sample (e.g., LVK 48), referring to pathovar differences in the same faeces. Pathotype distribution did not follow phylogeny (SNPs) (Figure S2; Table S16), consistent with the plasmid-borne nature of the virulence determinants used to determine the pathovars.
Clostridium perfringens toxino-typing and virulence factors detection
C. perfringens virulence is associated with toxin production (7), and all our nine C. perfringens were toxinotype A carrying the cpa (or plc) alpha-toxin producing gene and no other toxin genes as described in the toxino-typing scheme by Rood et al. (2018) (24) (Figure S2 and Table S5). Other diarrhoea-associated genes included cpb2 gene, encoding the C. perfringens beta 2 toxin (CPB2), found in one isolate (CP 11) and pfo gene encoding Perfringolysin O or theta toxin (PFO) carried by all isolates. In addition, all strains harboured ccp (encodes possible clostridial myonecrosis-associated alpha-clostripain toxin) and the two-component regulatory system of pfoA and cpa/plc, virS and virR. None of the C. perfringens strains harboured netB gene, implicated in necrotic enteritis, or cpe gene for C. perfringens enterotoxin production. However, one isolate (CP 12) carried the heat-stable enterotoxin 1-encoding east1 gene, commonly associated with ETEC.
Detection of bacteria and intestinal parasites
PromethION metagenomes consistently detected a significantly (p < 0.05) greater number of bacterial species and yielded higher sequencing depths compared to GridION (Figure S6A; Table S9B; Table S10). Several Lactobacillus species (including L. amylovorus, L. johnsonii, L. crispatus) and Limosilactobacillus reuteri, were among the top 50 most abundant bacterial species in the pigs microbiomes that were detected across all platforms and approaches. Clostridium sp. was also among the most abundant bacterial species, although identified by the PromethION approach in the majority of the cases. GridION sequencing approaches also detected additional species, including L. johnsonii, Streptococcus alactolyticus, L. crispatus, and Dorea longicatena (known pig microbiome healthy bacteria (39)), although at lower depths. Interestingly, several species detected at very low depth by PromethION, such as Kurthia huakuii, Phocea massiliensis, and Vagococcus fluvialis, were also captured by GridION (Figure S6A).
The deep PromethION sequencing allowed us to identify bacterial families that separate diarrheal from healthy pig microbiomes (Figure S6B). Carnobacteriaceae and Corynebacteriaceae detected in the diarrheal pig microbiomes are likely the key drivers that separate healthy from diarrheal groups. Carnobacteriaceae are lactic acid bacteria present in the porcine gut and responsive to diet transitions around weaning, consistent with well-described weaning-associated microbiome shifts (40–43), whereas Corynebacteriaceae include opportunistic pathogens of livestock, such as C. pseudotuberculosis and the C. renale group, reported in small ruminants and sometimes associated with gastrointestinal signs (44).
To assess the presence of clinically relevant taxa, the metagenomic reads of the diarrhoeal pig samples were additionally screened against a panel of pathogenic bacteria defined previously (13) and updated in this study. E. coli was the dominant taxon, consistently detected across nearly all samples and approaches when present in high depth (Fig. 3; Table S11). Other bacterial pathogens, identified by PromethION, including Staphylococcus aureus, Klebsiella oxytoca, Klebsiella pneumonia, Enterococcus faecium, Listeria monocytogenes and Acinetobacter baumannii, were also identified by all sequencing approachesbut generally at very low abundances. Overall, the PromethION research-based approach detected a wide range of pathogenic taxa, while the rapid GridION approaches identified only a few of these species (Fig. 3).
Regarding intestinal parasites, B. pilosicoli and L. intracellularis, amongst other major contributors to PWD, were identified by qPCR in 14 and 13 samples, respectively, with threshold of Ct ≤ 35 (Table 1). PromethION confirmed L. intracellularis in eight of these and detected Brachyspira spp. in 15 samples (Figure S7). Additionally, Enterocytozoon bieneusi was the only parasite previously associated with intestinal dysbiosis in pigs that was consistently detected across all diarrhoeal and healthy pigs, primarily by PromethION (Table S12). Since parasites affecting pigs have larger genomes than prokaryotes (e.g., Blastocystis: 12 Mbp), the raw data were also screened at lower genome coverage (i.e., 0.1), which resulted in detection of Blastocystis spp. in four of the diarrhoeal samples by PromethION and Cryptosporidium spp. across all diarrhoeal samples and approaches (diarrhoeal and healthy) albeit with very low depth (Table S12).
Detection of the diarrhoea-causing E. coli in the metagenomic samples
High-quality reads from each metagenomic dataset were mapped to a database of the assembled genomes from the 26 E. coli isolates recovered by culturing, allowing to assess whether the cultured isolates could be detected in the corresponding faecal metagenomes. Strong alignment (71 % − 96% coverage) between the E. coli genomes and the respective metagenomic sample was observed infive cases (e.g., MtG 11 to LVK 11; MtG 43 to LVK 43; MtG 48 to LVK 48; MtG 60 to LVK 60 and LVK 60; MtG 83 to LVK 83) (Fig. 4). In other cases (e.g., MtG LVK 48, MtG LVK 60, MtG LVK 83), the metagenomic reads aligned to multiple E. coli genomes at variable coverage, with several E. coli isolates dominating the metagenome (Fig. 4). This indicates that the approach is hindered by the presence of closely related E. coli isolates in the metagenomes (e.g., MtG LVK 82 and MtG LVK 83 Figure S1), and the limits of discriminating bacteria at strain level using direct read mapping.
Detection of virulence factors in E. coli isolates and the metagenomes
ETEC is the E. coli pathovar that has been mostly associated with PWD (38). Fourteen of the 26 E. coli isolates were assigned to ETEC (Figure S2A). Unique characteristic of this pathotype is carriage of genes encoding F4 or F18 fimbriae and/or two types of enterotoxins, heat-stable (STa and STb) and/or heat-liable (LT), located on plasmids with replicon types IncFIC and/or IncFIB (4).
Three ETEC isolates carried est and elt genes for both enterotoxins (LT and ST), 10 harboured either LT or STa, and none of them were detected in one ETEC isolate (Fig. 5A).
Regarding fimbriae genes, seven ETEC carried genes for F4 fimbriae, six for F18 and one isolate had no fimbriae genes (LVK 83). Overall, 12 ETEC carried genes for at least one enterotoxin and one fimbriae, while the rest of the ETEC isolates carried either enterotoxin or fimbriae genes. All ETEC isolates carried genes for production of either fimbriae or enterotoxins (Fig. 5A). F4 fimbriae, typically associated with ETEC in post-weaning pigs, was detected only in ETEC isolates in this study, a variant of F18 (F18ab, which differs from the one found in ETEC - F18ac) found in STEC causing edema disease (46) was also detected in three STEC isolates and in four E. coli isolates with undetermined pathotype (Fig. 5A).
Metagenomics sequencing detected all virulence factors that were found in the E. coli isolates, with only few inconsistencies (Fig. 5B). Additionally, metagenomics identified more enterotoxin and fimbriae genes in the faecal samples (Fig. 5), suggesting presence of uncultured ETEC isolates or other E. coli pathovars carrying variants of these virulence factors.
Metagenomics and qPCR comparison also showed overall concordance with only several inconsistencies (Fig. 5B). While 18 samples were F4-positive by qPCR, nine of these were F4 negative by metagenomics, however in two of them other F4 sub-unit genes (faeF, faeH, faeI) were detected (Fig. 5B). 21 samples were F18-positive by qPCR compared to 15 by metagenomics, and one with other F18 sub-unit genes (fedE, fedF). Most discordances were cases where qPCR was positive but metagenomics negative, however, one metagenomic sample (LVK 52) was F18-positive with metagenomics yet negative using qPCR (Fig. 5B).
Detection of antimicrobial resistance genes in the E. coli isolates and the metagenomes
The cultured E. coli WGS harboured genes for resistance to eight antibiotic classes, while in the metagenomic samples resistance to 16 antibiotic classes was observed (Figure S8A). All ARGs identified in the E. coli isolates were also detected in the corresponding metagenomes (Table S13). A variety of ARG profiles were identified in the E. coli isolates with greatest diversity of genes conferring resistance to aminoglycosides (sixteen different aminoglycoside resistance genes), followed by macrolide lincosamide streptogramin B (MLS) antibiotics (Figure S8A). All ETEC isolates but three (two EPEC and one of undetermined pathotype) were multidrug resistant (MDR), carrying genes for resistance to aminoglycosides, beta-lactams, macrolides, tetracyclines, sulfonamides, trimethoprim and quinolones. An STEC isolate (LVK 48) harboured resistance genes to seven of the eight antibiotic classes detected among the isolates (Figure S8A).
Some ETEC isolates within a phylogenetic cluster exhibited nearly identical ARG repertoires (e.g., DTU 82/LVK 83/DTU 83; DTU 56/DTU 52 and DTU 60/LVK 60), while other phylogenetically close ETEC isolates carried markedly different ARG profiles (e.g., DTU 80/DTU 48; LVK 97/LVK 94 (Figure S8C) attributable to the varying plasmid content (Figure S8D). Amongst the corresponding metagenomes, the highest diversity of ARGs was observed among the MLS antibiotics, followed by aminoglycosides and beta-lactam antibiotics (Figure S8B). This pattern reflects antimicrobial use in Danish pig production, where macrolides are commonly applied for Lawsonia infections in slaughter pigs, while aminoglycosides are widely used to treat E. coli-associated PWD in weaners. Importantly, no ESBL, carbapenemase, or mcr genes were detected neither in the single isolates nor in the metagenomic samples. PointFinder also could not detect any known mutations in the gyrA or parC genes linked to fluoroquinolone resistance.
Since beta-lactam resistance genes are one of the most abundant ARGs, especially in Gram-negative bacteria and often occur with other ARGs on mobile genetic elements (47), we examined their occurrence and detection in both single isolates and diarrheal samples across sequencing platforms and approaches (Figures S8 and S9). While blaTEM-1B was the only beta-lactam gene detected in the E. coli isolates (Figure S8C), 12 bla gene families were identified in the metagenomic samples, blaTEM was the most abundant bla gene among them, however, cfxA, due to its association with Bacteroides, was the most abundant beta-lactam gene followed by blaOXA, blaCARB, blaACI and blaEBR gene families (Figure S9A), often co-occurring with other ARGs on mobile genetic elements in bacterial pathogens such as Pseudomonas, Proteus spp. and Acinetobacter spp. (some of which were among the detected pathogens in the metagenomic samples (Fig. 3)).
When comparing the detection of beta-lactam resistance genes across sequencing approaches, in 50% of the metagenomic samples none of the bla gene families were detected by the GridION 23-plex compared to the PromethION (Figure S9A). However, when ARGs were merged into antibiotic groups to which they confer phenotypic resistance (in accordance with the ResFinder database), the three most abundant antibiotic classes (tetracyclines, aminoglycosides and MLS) were also detected by the GridION rapid detection, although at lower depths (Figure S9B). Lastly, the aminoglycoside antibiotics were the most abundant class found in nearly all E. coli isolates (Figure S9C), likely explained by the use of aminoglycoside antibiotics (neomycin and apramycin) for treatment of PWD in Danish pig farming.
Comparison between E. coli MAGs and single isolates
High-quality (HQ; >90% completeness, < 5% contamination) and medium-quality (MQ; 50–90% completeness, < 10% contamination) E. coli MAGs were assembled from 14 metagenomic samples (Table S14). Eight MAGs had > 90% completeness and genome sizes of 4.6–5 Mbp (Table S14). E. coli could not be cultured from faecal sample LVK 18, but an E. coli MAG from LVK 18 was reconstructed. MLST assignment was only possible for MAG LVK 74 (ST10, 99.96% completeness) and MAG LVK 83 (ST48, 100% completeness) as MAGs often suffer of low completeness. In both cases, the E. coli single isolates and the reconstructed MAGs had the same ST, indicating that the MAGs and the isolates could be the same strain.
E. coli genome comparisons using phylogeny and average nucleotide identity (ANIb)
To examine how the constructed E. coli MAGs compared with the cultured E. coli WGS from the corresponding microbiome, SNP and core-genome phylogenetic inferences were constructed. Both SNP and core-genome phylogenies had the same topologies and similarly clustered the E. coli MAGs and single isolates. Six cultured E. coli isolates closely clustered with the reconstructed MAGs from the same samples (Fig. 6A). Among those, MAG LVK 48/SI LVK 48 and MAG LVK 43/SI LVK 43 had 3557 and 6719 SNP differences, respectively (Fig. 6A; Table S15), and MAG LVK 74/SI LVK 74 had only 1041 SNPs difference (Fig. 6A; Table S15). Others paired MAGs and WGS isolates had lower MAG/genome coverage (58% and 64% between MAG LVK 82 and SI DTU 82 and MAG LVK 88 and SI DTU 88, respectively (Figure S10B)), yet remained closely related in the SNP matrix (Figure S10A; Table S14 & S15).
Other paired MAGs and WGS E. coli that showed weaker correspondence, e.g., MAG LVK 42 and MAG LVK 98 were of low quality (61% and 55% completeness, respectively). Other E. coli MAG – isolate pairs showed weaker or more complex correspondence. MAG LVK 43 and MAG LVK 48 were clearly closest to their cultured isolates, yet differed by 6719 and 3557 SNPs (Figure S10A; Table S14; Table S15), respectively, compared to their WGS isolates (SI LVK 43 and SI LVK 48). MAG LVK 82 was only 58% complete and differed from isolate DTU 82 by 4419 SNPs, but was almost equally close to DTU 83 and LVK 83, which differed by just 120 SNPs among one another, suggesting that multiple closely related E. coli lineages co-occurred in these microbiomes. In contrast, MAG LVK 83 was 90% complete and closely matched both isolates from the same faecal sample (278 and 503 SNPs), confirming successful recovery of that lineage. MAG LVK 88 also showed partial overlap with SI_DTU_88 (6,203 SNPs at 64% coverage), while MAG LVK 74, in addition to clustering with SI LVK, also grouped with isolates from LVK 17, DTU 42, and LVK 102 (1,600–2,000 SNPs; Fig. 6A; Figure S10A), indicating the presence of related strains across different pigs.
The results of the SNP-based phylogeny were further supported by ANIb analysis. Hadamard values (ANIb × coverage) (Figure S10B) were highest for the same MAGs – E. coli genomes MAG LVK 48 to SI LVK 48 (0.82), MAG LVK 43 to SI LVK 43 (0.74), MAG LVK 74 to SI LVK 74 (0.84), and MAG LVK 83 to SI DTU 83A (0.90). MAG LVK 82 and MAG LVK 88 showed intermediate values (0.59–0.62) due to their lower completeness.
Dot plot alignments were also constructed to compare the synteny between the common regions of the cultured E. coli isolates and the reconstructed MAGs for the closely-related pairs in the SNP-based phylogeny (Fig. 6A) with overlapping genomic regions ranging 55–90% (Fig. 6B; Figure S11). While some of them showed strong collinearity with large genomic overlaps and only a few gaps or duplications (e.g., MAG LVK 43 and SI LVK 43, 75% and MAG LVK 74 and SI LVK 74, 82%) (Fig. 6B), other pairs exhibited larger gaps (possibly due to incomplete MAG), inversions or low genomic overlaps (Figure S11), which could be attributed to incomplete or misassembled MAGs or strain-level variations.
Sub-typing of the E. coli MAGs
Pathotyping was generally not possible for the E. coli MAGs, except for MAG LVK 42 and MAG LVK 48 (Table S16), as the majority of the genes that determine the E. coli pathotypes are located on plasmids (e.g., hlyA, sta1, stb, eltAB), which is consistent with the absence of plasmid replicon genes and plasmid-borne virulence genes in the MAGs. The only virulence gene that was consistently found in the MAGs was hlyA, and because hlyA is not exclusive to pathogenic E. coli, its presence is not indicative for a disease-associated E. coli. As E. coli share large parts of their genomes, the constructed MAGs could also belong to commensal E. coli strains. Interestingly, SI DTU 42C (STEC) carried stx and fimbrial genes, whereas its corresponding MAG (MAG LVK 42) was typed as EPEC based on the eae gene, suggesting that two distinct E. coli lineages could have been recovered in that sample, by metagenomics and culturing. Similarly, MAG LVK 48 (STEC) formed a phylogenetic cluster with SI LVK 48 (STEC), but away from SI DTU 48 (ETEC), indicating successful construction of the STEC but not the ETEC strain (Fig. 6; Table S16). Similarly, due to the incomplete MAGs only two could be sub-typed by the Achtman 7-gene MLST E. coli scheme (LVK 74, ST10 and LVK 83, ST48), both of which matched their cultured isolates. Screening of MAGs against VFDB failed to identify any of the fimbrial (F4, F18) or enterotoxin (estA, eltAB) genes that were present in the corresponding isolates (Fig. 5; Table S18), consistent with their plasmid location. Likewise, no plasmid replicon genes were recovered from MAGs, in contrast to the isolates where multiple replicons from various incompatibility groups were frequently detected (e.g., IncF, IncX, IncY) (Table S6). ARGs were detected in two MAGs only (MAG LVK 56 and MAG LVK 97), however their profiles differed from the corresponding WGS: MAG LVK 56 carried erm(B) and ant(3'')-Ia, which were absent from SI DTU 56, while SI LVK 97 carried dfrA and sul1 but MAG LVK 97 harboured mef(C)-mph(G) and sul2, both located on the same contig (Table S20).