Sunday, March 3, 2024

The plasmidome associated with Gram-negative bloodstream infections: A large-scale observational study using complete plasmid assemblies – Nature Communications

We sequenced and assembled n = 953 isolates of which n = 738 were complete and included in subsequent analysis (Supplementary Fig. S1). Of these, 75% (553/738) were E. coli (n = 153, 297, 103 in 2009, 2018, intervening years, respectively), 22% (161/738) Klebsiella spp. (n = 39, 58, 64 in 2009, 2018, intervening years, respectively) and 3% (24/738) other Enterobacterales species (details in Supplementary Fig. S1 and Supplementary Data 3). In total, these 738 isolates carried 1,880 plasmids with a median of 2 plasmids per isolate (interquartile range (IQR) 1–3). 10% (77/738) isolates carried none, 29% (211/738) carried one and 61% (450/738) more than one (Fig. 1A). Of the n = 661/738 isolates with at least one plasmid, 77% (508/661) carried at least one large plasmid (i.e., sequence length >100,000 bp), and 94% 621/661) at least one large or medium plasmid (i.e., sequence length >10,000 bp); of these 53% (329/621) also carried at least one small plasmid (i.e., sequence length <10,000 bp). Carriage of one or more small plasmids in the absence of any medium or large plasmid was relatively rare at 6% (40/661). Rarefaction analysis suggested that a substantial number of plasmid groups (defined using a graph-based clustering method, see below) remain unsampled and that there is a significantly greater diversity amongst groups containing smaller (<100,000 bp) vs larger (≥100,000 bp) plasmids (Fig. 1B). There was some evidence that Klebsiella spp. isolates tended to carry slightly more plasmids than E. coli: median 2 (IQR 1–5) vs (median 2 (1–3) plasmids, respectively (Kruskal–Wallis, P value = 0.03; Fig. 1A), as did multi-drug-resistant (MDR i.e., carriage of ≥3 ARG classes) vs. non-MDR isolates: (n = 317/738 vs. n = 421/738 isolates; median 3 (IQR 2–4) vs. median 2 (1–3) Kruskall–Wallis, P value <0.001]).

Fig. 1: Characteristics of plasmid carriage in E. coli/K pneumoniae bloodstream infections.

A, B Number of plasmids per isolate for E. coli (A) and Klebsiella spp. (B), coloured by the number of ARG classes per isolates where MDR is ≥3, AMR 1–2 and no AMR 0. C Rarefaction curve of the number of novel plasmid groups (as defined using the Louvain-based method described above) per new plasmid sequenced stratified by size (large ≥100,000 bp, medium ≥10,000 to < 100,000 bp, small < 10,000 bp. D Number of plasmid-associated ARGs per isolate vs number of plasmids carrying at least one ARG. Isolates with only one plasmid-associated ARG (by definition, carried on one plasmid) are excluded. Source data are provided in the supplementary “Source Data” file.

Despite comprising a relatively small proportion of the total genome (median = 2.79%, IQR = 1.97–3.97%), plasmids carried 39% (2069/5311) ARGs, 12% (987/8315) virulence genes and 60% (2836/4735) stress response genes. 50% (368/738) isolates carried at least one plasmid-borne ARG and 306 at least 2; of these, 79% (242/306) carried all annotated ARGs on a single plasmid (Fig. 1C). In isolates with a medium or large plasmid, co-carriage of a small plasmid was significantly more common in isolates harbouring plasmid-borne ARGs 58% (210/361) vs. 46% (119/260) without (Fisher test, P value = 0.003).

Most BSI isolates carry a large (>100,000 bp) plasmid from a small number of common plasmid groups

We first attempted to classify plasmids using existing tools; 17% (317/1880) plasmids could not be assigned a replicon type, and 33% (622/1880) had no identifiable relaxase type. Similarly, 26% (487/1880) plasmids were not typable using the recently described Plasmid Taxonomic Unit (PTU) scheme13; 7% (128/1880) were not typable by any method tested. Subsequently, we therefore opted to use a previously described classification approach, utilising a graph-based Louvain community detection algorithm14 (see “Methods”), which has the advantage of not being reliant on reference databases for group assignment and is thus able to classify all plasmids into groups (hereafter referred to as “plasmid groups”). These Louvain-based plasmid groups generally clustered plasmids together at a lower distance threshold (i.e., more similar) than the other methods tests (median 0.251, IQR 0.051–0.522 vs 0.692 IQR 0.561–0.852 for COPLA/PTU clusters, 0.968 IQR 0.856–1.000 MOB-suite/Relaxase, 0.664 IQR (0.367–0.928) Plasmidfinder/Replicon typing (Supplementary Fig. S2). This approach yielded 513 groups from 1880 plasmids, of which 164 (32%) contained >1 plasmid, but only 33 (6%) contained ≥10 plasmids, and most were singletons (349/513 (68%)). As expected, given the more closely related groupings identified by the Louvain-based approach, this method created more groups compared to the others tested, and more of these were singletons (Supplementary Data 4). In all, 322/553 (58%) E. coli isolates carried a plasmid from one of the four most common, predominantly E. coli-associated, large (>=100,000 bp) plasmid groups (4/6/7/8, all PTU-FE) in Fig. 2) and similarly 76/161 (47%) Klebsiella spp. isolates contained a plasmid from one of the three most common, predominantly Klebsiella spp.-associated, large plasmid groups (1, 2 and 5, PTU-E35, FK and FK, respectively) in Fig. 2).

Fig. 2: Phylogenetic distribution of the most common (n> = 10 members) plasmid groups (n = 33 groups) and the content of these.
figure 2

The tree is a neighbour-joining tree built on Mash distances between chromosomes. Tip colours represent species/phylogroup. The black bars represent the presence or absence of plasmid groups (shown along the bottom x axis) for each isolate in the tree. The right panel shows the percentage of isolates within each of these 33 plasmid groups carrying the genes indicated (darker colours denote higher proportion of isolates carrying gene). To improve readability, gene groups have been clustered together. Source data are provided in the supplementary “Source Data” file.

Plasmid groups are structured by host phylogeny but there is evidence of intra and inter-species transfer events

Overall, 141/513 (27%) groups were found in >1 MLST and 22/513 (4%) were found in more than one species; multi-species groups had ≥10 members significantly more commonly (8/22 (36%) vs 25/491 (5%), P < 0.001) (Fig. 2). We found strong evidence that the pangenome of the plasmidome of BSI isolates was structured by host phylogeny, although there was also vast and persistent background diversity. Sequence type and host species explained 8% and 7% (Adonis P = 0.001 for both) of the observed variance in gene content between plasmidomes, respectively. ARG content explained a comparatively small amount of variance (R2 = 2%, P = 0.001), as did year of isolation (0.03%, P = 0.005) and source attribution (R2 = 1.2%, P = 0.99, i.e., the suspected focus of infection, only available for a small subset of isolates [198/738]) (Fig. 3, panels a, b, c and d, respectively). When we focussed on plasmid groups found in the most common E. coli STs (131, 95, 73), we observed that most were seen in only a single ST (78/109), but 13 “generalist” groups were seen in all three STs, and accounted for the majority of plasmids (215/400 54%). Highly similar plasmidomes were seen in genetically divergent members of each ST, consistent with multiple horizontal transfer events (Supplementary Fig. S3). Persistent plasmid groups seen in both 2009 and 2018 were also seen in more phylogenetically diverse isolates within STs (Supplementary Fig. S4), suggesting that the persistence of plasmids may be linked to their host range potential.

Fig. 3: A Umap projection of distances (measured by gene presence/absence) between the plasmidomes of isolates (each point represents the plasmidome, i.e., all plasmid sequences of a single isolate).
figure 3

These are coloured to show the variability explained by species (A)/ARG carriage (B)/year (C) and infection source (D). Source data are provided in the supplementary “Source Data” file.

Common plasmid groups share genes with each other; gene sharing with chromosomes is also frequent

Whilst we observed only 4% (22/513) plasmid groups were shared between species, we hypothesised that this might greatly under-represent the true extent of plasmid-mediated horizontal gene transfer given the role of smaller mobile genetic elements and the fact that BSIs represent a tiny fraction of the overall ecological landscape. We therefore looked for evidence of overlap in the pangenome between different plasmid groups as well as between these and host chromosomes. Most genes in the pangenomes of common (i.e., containing n ≥10 plasmids) plasmid groups of E. coli and Klebsiella spp. were non-unique to their group (median % non-unique genes 88%, IQR 67-98%). Most overlap occurred amongst genes found in the plasmid pangenome from the same species (median % shared genes 86% (IQR 50–95%) vs 31% (8–43%) from different species, P < 0.001). There was also substantial overlap between plasmid group pangenomes and the chromosome pangenome, although there was some evidence of convergence in the chromosomally integrated mobilome between species, evidenced by less difference in the proportion of genes shared with the chromosome for the same vs different species (Supplementary Fig. S5, median 33% (IQR 0–45%) vs 21% (0–35%), respectively P = 0.34).

Plasmids associated with ARG carriage are often highly similar to those with no such genes

The 439 plasmids carrying at least one ARG were predominantly large (≥100,000 bp, 277/439, 63%), low copy number (median 1.80 IQR 1.63–2.37) and conjugative (347/439, 79%). Whilst most plasmid-borne ARGs were carried by plasmids clustering in a small number of groups (i.e., 81% 1674/2069 ARGs were carried by 8 plasmid groups), 36% (170/474) plasmids in these groups did not carry an ARG and all groups had at least one such member, highlighting that acquisition of ARGs in ARG-negative plasmid backbones represents a common risk across genetically divergent plasmid groups (Fig. 4). We repeated this analysis using group assignments given by COPLA (Plasmid Taxonomic Units) and Plasmidfinder (replicon typing) and found similar results (Supplementary Data 5), suggesting that this finding is robust to the choice of clustering method.

Fig. 4: Plasmid network where individual plasmids (nodes) are connected by edges if they cluster in the same group using the Louvain-based methodology and coloured according to the number of classes of ARGs that they carry.
figure 4

Edge thickness is drawn proportional to the Jaccard distance (see methods) between plasmids. Multi-species clusters are donated by black outlined shapes. Only plasmids groups with ≥10 members are shown.The ordering of clusters corresponds to that in Fig. 2. Labels above clusters denote the PlasmidFinder/COPLA taxnomic designations, respectively; plasmid groups are numbered consecutively from the top left. Source data are provided in the supplementary “Source Data” file.

Hybrid assembly reveals complex nested diversity associated with key AMR genes, significant chromosomal integration of ARGs and the presence of multiple copies in different contexts

Chromosomal integration of ARGs was common: for example, in E. coli, 56% (23/41) blaCTXM−15, 9% (2/22) blaCTXM−27, 14% (42/293) blaTEM−1, 42% (14/33) blaOXA−1, 39% (7/18) aac(3)-IIa and 5% (3/65) dfrA17 were chromosomally integrated. There was significantly more chromosomal integration of ARGs also seen at least once in a plasmid in our study in E. coli vs Klebsiella spp. (restricting to 2009 and 2018 only 15% [324/2103] vs 8% 39/478 [8%], Chi-squared test P < 0.001). For E. coli, there was significantly more chromosomal integration in 2018 vs 2009 (19% 285/1485 vs 6% 39/618, Chi-squared test P < 0.001) but there was no evidence of this for Klebsiella spp. (7% [3/190 vs 6% 17/279, Chi-squared test P = 0.89). For most of these ARGs, there were multiple instances of isolates carrying two (and occasionally more) copies (9 such examples for blaCTXM−15 (Fig. 5), 1 blaCTXM−27, 29 blaTEM−1, 2 aac(3)-IIe and 1 dfrA7).

Fig. 5: Nested genetic complexity associated with blaCTXM−15 mobilisation.
figure 5

The “Tree” panel shows a neighbour-joining tree of Mash distances between chromosomes for isolates carrying a blaCTXM−15 gene. Tip colours represent species/ST/phylogroup. The chromosomal copy 1 and 2 panels show the genetic context 5000 bp up- and downstream from chromosomal copies of the blaCTXM−15 gene (shown in red); the plasmid copy panel shows this equivalent information for isolates carrying a plasmid-borne copy of this gene. The outlining colour in these panels shows the hierarchical cluster assignment of these flanking groups. The plasmid group panel shows group membership of plasmids carrying the blaCTXM−15 gene with each x axis position representing a distinct group and black bars showing the presence or absence of these for isolates in the tree. The encircled numbers denote: 1—different flanking sequences in the same ST, 2—different flanking sequences in the same plasmid group, 3—the same flanking group found in both chromosomal and plasmid contexts and 4—different plasmid groups harbouring the gene found within the same ST. Source data are provided in the supplementary “Source Data” file.

Given the global importance of the ESBL gene blaCTXM−15 conferring third generation cephalosporin resistance, we focused on its genetic background and putative dissemination mechanisms. As mentioned above, plasmid groups carrying this gene in our dataset were generally species-constrained. However, within a single species, considering phylogroup, sequence type and even plasmid group, blaCTXM−15 was found in a variety of genetic contexts (Fig. 5). For example, in E. coli ST131 it was found in five plasmid groups and was chromosomally intergrated in 41% (17/41) isolates. Within ST131 sub-clades, there was some evidence of vertical transmission, as well as numerous independent integration events. In many cases, several unique gene flanking regions were found in association with blaCTXM−15 within a single plasmid group, or identical flanking regions were shared across plasmid groups and between plasmid groups and chromosomes. Visual inspection of gene flanking regions and hierarchical clustering of a weighted graph (“Methods”: Bioinformatics) suggested that whilst there was substantial diversity, these flanking regions appear to have evolved in a stepwise manner with bilateral association of blaCTXM−15 and Tn2 in flanking groups 2, 3 and 6 compared to the presence of Kpn14 (groups 1 and 5) and IS26 (group 4) (Supplementary Fig. S6). Inspection of core-genome phylogenies of the two largest blaCTXM−15 carrying plasmid groups (plasmid groups 2 [IncF, PTU-FK] and 3 [IncF, PTU-FE] in Fig. 2) demonstrated multiple probable independent horizontal acquisition events of transposable units containing this gene (and other ARG cassettes Supplementary Figs. S7 and  S8), suggesting that a flexible capacity to acquire ARGs through diverse mobile genetics elements rather than a fixed association with them might be important factors for the successful dissemination of the host plasmid.

Comparison with wider plasmid datasets highlights undersampled plasmid diversity, more widespread inter-species and inter-niche plasmid sharing, and the potential for carbapenemase dissemination amongst high-risk plasmid groups

We repeated our graph-based plasmid clustering method on a combined dataset of Oxfordshire plasmids (N = 1880, hereby referred to as the “Oxfordshire dataset”) and the Global collection of plasmids deposited in the NCBI (N = 10,159, denoted the “global dataset”) using the same sparsifying threshold (≤0.551). This yielded 5913 groups, of which 484 contained at least one plasmid from the “Oxfordshire dataset”; of these, 326/484 groups (67%) containing 536 plasmids appeared to be unique to Oxfordshire. In total, 79/484 (16%) of groups containing Oxfordshire plasmids were found in more than one species in the full dataset; of these 57 (72%) occurred in only a single species in the Oxfordshire dataset, highlighting the substantial underestimation of wider between-species dissemination by investigating only a single region and single source (i.e., bloodstream infections).

A striking feature of the global network was that plasmids carrying carbapenemase genes clustered with those that did not (Fig. 6). Of 122 plasmid groups with at least one member carrying a carbapenemase gene, 19 (16%) contained at least one Oxfordshire plasmid. These included representatives from the K. pneumoniae MDR-associated Oxfordshire BSI dataset groups 2 and 5 (Fig. 2), three large groups (Fig. 2, groups 3/6/8) widely distributed amongst E. coli isolates and two groups of smaller plasmids (<100,000 bp, Fig. 2 groups 10 and 12), also widely distributed in Oxfordshire E. coli. Although only 2% (7/414) Oxfordshire plasmids falling into these groups actually carried a carbapenemase ARG, this suggests the potential for carbapenemase acquisition and dissemination amongst widespread “high-risk” plasmid backbones.

Fig. 6: Plasmids carrying carbapenemase genes are highly similar to plasmids without these genes found in Oxfordshire BSIs.
figure 6

A Each horizontal bar represents a plasmid assembly either from Oxfordshire “Group 2/11” or the NCBI global dataset. Common genes are shown in colour (with blast identity between these shown from light grey to black (where the latter represents a perfect match), whereas genes unique to a given plasmid are shown in grey. B A network plot of plasmids which cluster with carbapenemase-carrying plasmids in the global network analysis. The “carbapenemase gene” grouping includes all those variants identified and classified as conferring resistance to the class “Carbapenem” in AMRFinder. Plasmids (nodes) are connected with an edge where the edge weight is ≤0.551 (see “Methods”). The thickness of the edges is displayed so that it is proportional to the edge weight. Source data are provided in the supplementary “Source Data” file.

Factors predictive of plasmid group success

Having demonstrated that most isolates carry a plasmid from a relatively small number of plasmid groups, we next sought to understand what factors might be driving the widespread dissemination of these amongst BSI isolates. Multivariable Poisson regression analysis revealed that plasmid group frequency (a subjective marker of evolutionary “success”) was associated with isolation in multiple species (adjusted rate ratio aRR 4.89, 95% CI 4.29–5.57, P < 0.001), capacity to conjugate (aRR 1.73, 95% CI 1.47–2.04) or mobilise (aRR 1.29, 95%CI 1.13–1.48) (i.e., containing either a relaxase or oriT but missing a mate-pair formation marker), carriage of multiple ARGs (aRR 1.23, 95% CI 1.19–1.27)/virulence (aRR 1.44, 95%CI 1.36–1.53), toxin–antitoxin genes (aRR 1.32, 95% CI 1.18–1.47) and a higher GC content (aRR 1.01, 95% CI 1.00–1.03) (Supplementary Data 6). Carriage of ARGs (adjusted odds ratio, (aOR = 2.88, 95% CI 1.53–5.41, P < 0.001) and isolation in multiple species (aOR = 7.79, 95% CI 3.07–22.90, P < 0.001) were independently associated with a higher probability of plasmid groups being observed internationally (Supplementary Data 7).

Machine learning allows risk stratification of plasmids

Given that we have shown that plasmids carrying ARGs are often very similar to those with no such genes (“ARG-negative plasmids”), we hypothesised that it might be possible to predict whether ARG-negative plasmids pose a risk for eventual association with ARGs. To do this, we first performed a genome-wide association study using the Oxfordshire dataset only, which was corrected for population structure and plasmid size, to identify genes (excluding known ARGs) significantly more or less likely to be carried by plasmids in ARG-associated groups (i.e., plasmid groups where at least one member carries at least one ARG). This revealed significant associations between ARG-associated plasmid groups and the presence of insertion and transposon sequences, various virulence factors, toxin/antitoxin system and heavy metal resistance genes (Supplementary Data 8).

We then tested the predictive value of these elements to identify ARG-negative plasmids belonging to ARG-associated groups using a variety of models on the Oxfordshire dataset (see “Methods”) with stratified tenfold cross-validation to estimate out-of-sample performance. The best-performing model (Random Forrest) had a mean accuracy of 90.3% (standard deviation [SD] 2.4%), mean area under the receiver operator curve [AUC] 0.90 (SD 0.02), mean sensitivity 86% (SD 4.3%) and mean specificity 93.4% (SD 2.9%). We re-trained the Random Forrest model on ARG-negative plasmids in the global dataset using only plasmids sequenced prior to 2018 and subsequently made predictions on the held-out 2018 plasmids. This demonstrated that the model generalised well but was less sensitive on this dataset (accuracy 84.6%, AUC 0.82, sensitivity 73.7% and specificity 89.9%).

Source link

Related Articles

Leave a Reply

[td_block_social_counter facebook="beingmedicos1" twitter="being_medicos" youtube="beingmedicosgroup" style="style8 td-social-boxed td-social-font-icons" tdc_css="eyJhbGwiOnsibWFyZ2luLWJvdHRvbSI6IjM4IiwiZGlzcGxheSI6IiJ9LCJwb3J0cmFpdCI6eyJtYXJnaW4tYm90dG9tIjoiMzAiLCJkaXNwbGF5IjoiIn0sInBvcnRyYWl0X21heF93aWR0aCI6MTAxOCwicG9ydHJhaXRfbWluX3dpZHRoIjo3Njh9" custom_title="Stay Connected" block_template_id="td_block_template_8" f_header_font_family="712" f_header_font_transform="uppercase" f_header_font_weight="500" f_header_font_size="17" border_color="#dd3333"]
- Advertisement -spot_img

Latest Articles