Abstract
The complete genome of Legionella lytica PCM 2298 was sequenced and analyzed to provide insights into its genomic structure, virulence potential, and evolutionary position within the Legionella genus. The genome comprised a 3.2 Mbp chromosome and two plasmids, pLlyPCM2298_1 and pLlyPCM2298_2, contributing to a total genome size of 3.7 Mbp. Functional annotation identified 3,165 coding sequences, including genes associated with known virulence factors such as the major outer membrane protein (MOMP), the macrophage infectivity potentiator (Mip), and a comprehensive set of secretion systems (type II, type IVA, and type IVB Dot/Icm type IV secretion system). Notably, L. lytica contributed 383 unique genes to the Legionella pangenome, with 232 identified effector proteins, of which 35 were plasmid-encoded. The identification of unique genes, particularly those on plasmids, suggests an evolutionary strategy favoring horizontal gene transfer and niche adaptation. The effector repertoire included proteins with domains characteristic of host interaction strategies, such as ankyrin repeats and protein kinases. Comparative analyses showed that while L. lytica shares core virulence traits with other Legionella species, it has distinct features that may contribute to its adaptability and pathogenic potential. These findings underscore the genomic diversity within the genus and contribute to a deeper understanding of Legionella’s ecological and clinical significance. A custom web application was developed using the R Shiny library, enabling users to interactively explore the expanded Legionella pangenome through UpSet plots.
Similar content being viewed by others
Introduction
Legionella spp. are common environmental bacteria that show the ability to parasitize in phylogenetically distant hosts, i.e. freshwater protozoa and human cells, specifically, lung macrophages. In the human host, bacteria can efficiently replicate within a phagocytic cell by forming a structure called the Legionella-containing vacuole (LCV). This sophisticated intracellular compartment provides a replication niche and allows bacteria to avoid phagolysosomal degradation, protect them from intracellular defenses, and maintain proper concentration of essential nutrients. This ability of Legionella spp. leads to the development of severe pneumonia called Legionnaires’ disease1. Legionellosis results mainly from inhalation of Legionella-contaminated aerosols and subsequent bacterial replication within alveolar macrophages2.
Approximately half of the 73 Legionella species identified to date (LPSN database accessed on November 7th, 2024)3 have been associated with human disease; however, the overwhelming majority of clinical infections (~ 90%) are caused by Legionella pneumophila4,5,6. There is a growing interest in bacteria of the genus Legionella, which is directly linked to increasing urbanization and global climate change. Air conditioning systems built in response to the drastic heat waves that affect many regions, even in temperate climates, are common places for Legionella spp. It is therefore crucial to better understand the ecology and virulence of bacteria belonging to this genus.
Comparative and functional genomics suggested that the Legionella–protozoa interaction has shaped the bacterial genome structure extensively7,8. A complex evolutionary scenario involving mobile genetic elements, type IV secretion systems, and horizontal gene transfer (HGT) between Legionella, amoebae, and other organisms was developed. Moreover, it was revealed that Legionella genomes are highly dynamic and characterized by frequent genetic exchange. The analysis of 80 Legionella genomes spanning 58 species revealed that genome sizes range from 2.37 Mbp (Legionella adelaidensis) to 4.88 Mbp (Legionella santicrucis), with GC content varying between 34.82% (Legionella busanensis) and 50.93% (Legionella geestiana). The GC content of Legionella genomes was inversely correlated with genome size, suggesting a strong involvement of HGT, which has resulted in an increase of AT-rich regions during evolution9. Importantly, of the 17,992 orthologous gene clusters identified, 5,832 (32%) were strain-specific, while only 1,008 genes (6%) comprised the core genome, demonstrating the high and diverse coding potential of this group of bacteria10.
Previous analyses indicated that the plasticity of Legionella genomes is associated with the presence of numerous mobile genetic elements, i.e. plasmids, conjugative elements, prophages, and genomic islands7. Plasmids have been found in L. pneumophila, Legionella longbeachae, Legionella fallonii, Legionella hackeliae, and Legionella anisa11,12,13. Currently, within public databases, plasmids are found in 35% of the complete genomes of Legionella spp., with sizes ranging from 13 to 373 kbp, constituting 0.1 to 16% of the total genome size. However, megaplasmids (defined as plasmids with a minimum size of 350 kbp)14 are rarely detected. Conjugation abilities have been shown only for a few L. pneumophila plasmids15. Although analysis of the Legionella accessory genome has indicated extensive interspecies and intraspecies gene flow, the full extent to which HGT occurs among the Legionella spp. remains largely unknown.
The long-term co-evolution of Legionella and amoeba has led to the development of highly sophisticated virulence strategies in these bacteria that allow for temporal and spatial fine-tuning of bacterial–host cell interactions. All sequenced Legionella species encode a highly conserved type IVB secretion system (T4SS), known as Dot/Icm (defective for organelle trafficking/intracellular multiplication)16. The Dot/Icm secretion system is recognized as the major pathogenicity factor of Legionella. The system components are encoded by 25 genes organized in two separate genomic regions with conserved gene order and orientation17. The Dot/Icm T4SS is essential for intracellular replication, spans both bacterial membranes, and translocates hundreds of virulence factors, termed effector proteins (effectors), directly into host cells. The effectors perform diverse functions, broadly involved in subverting lysosomal bacterial degradation and acquiring nutrients from the host cell2. The number of putative effectors encoded by various Legionella species varies widely, ranging from 52 in L. adelaidensis to 247 in Legionella waltersii17. Molecular analyses of L. pneumophila have identified more than 300 effector proteins as substrates of the Dot/Icm system, with many experimentally confirmed and assigned specific biochemical activities, targets, or functions during infection18. However, the pan-genomic pool of putative Legionella T4SS effectors is much larger, comprising the repository of over 18,000 proteins, including eukaryotic-like ones, which were shaped during evolution by interdomain gene transfer10. Noteworthy, the effector repertoires of different Legionella species were found to be largely non-overlapping, with only a few core effectors shared among all species. Species-specific effectors usually displayed low GC content, suggesting exogenous acquisition from natural protozoan hosts17.
Eukaryotic-like proteins and motifs are another hallmark of the Legionella genus. More than 200 eukaryotic-like proteins and 137 eukaryotic domains (the widest diversity of eukaryotic-like proteins known to date in bacteria), including a unique class of putative bacterial Rab GTPases, have been detected in Legionella genomes, suggesting that the manipulation of host signal transduction, protein turnover, and chromatin modification pathways are fundamental intracellular replication/pathogenicity strategies for Legionella spp.1,6,11. Additionally, other virulence-related traits in Legionella, particularly those involved in oxygen binding, iron storage, and host membrane transport, have been postulated to play significant roles in pathogenic mechanisms11.
The environmental strain Legionella lytica PCM 2298 was originally isolated as a new genus and species of obligate intracellular bacterial parasite of small free-living amoebae from the Acanthamoeba-Naegleria group19. The bacterium causing fatal infections of amoebae was initially named Sarcobium lyticum and reclassified as L. lytica based on 16S rRNA gene sequence analysis20. The entry of the bacteria into a host occurred by phagocytosis, but they grew directly in the cytoplasm rather than within phagosomes. This parasite was readily distinguished from other types of bacteria previously known to live inside amoeba cells by its host cell lytic activity. Morphologically, it is a Gram-negative, rod-shaped bacterium with tapered ends, and the cells are motile due to a polar tuft of flagella19.
Beyond PCM 2298, L. lytica has been identified in environmental samples such as hot spring water and other aquatic habitats, underscoring its adaptability to diverse ecological niches21,22. However, PCM 2298 remains the most studied strain of this species, with detailed structural studies revealing its unique cellular envelope composition, including a distinctive phospholipid and fatty acid profile that differentiates it from other Legionella species23,24.
In this study, the genome of L. lytica PCM 2298 was sequenced using a hybrid approach combining short- and long-read sequencing technologies. To the best of our knowledge, the presented genome is the first complete genomic sequence of the L. lytica species. The obtained high-quality genome was subjected to comprehensive annotation and comparative analysis to explore its metabolic potential and identify putative virulence factors.
Materials and methods
Culture conditions, DNA isolation, and sequencing
For genomic DNA isolation, L. lytica PCM 2298 was cultured in ACES-buffered yeast extract medium until reaching the mid-log phase. Cells were harvested by centrifugation at 4000×g for 10 min at 4 °C. DNA extraction was performed using the DNeasy Blood & Tissue Kit (Qiagen, Germany), with modifications for Gram-negative bacteria, including an enhanced lysis step with 20 mg/ml lysozyme at 37 °C for 30 min. For sequencing, DNA libraries were constructed using the NEBNext Ultra II DNA Library Prep Kit for Illumina and sequenced on the Illumina HiSeq 2500 system to generate 250 bp paired-end reads. Additionally, long reads were obtained using the Oxford Nanopore MinION sequencer with an R9.4 flow cell using 1D2 chemistry.
PFGE analysis
Genomic DNA was subjected to pulsed-field gel electrophoresis (PFGE) using the CHEF-DR III system (Bio-Rad). Electrophoresis was performed in a 0.5 × TAE buffer at 6 V/cm, with pulse times ranging from 1 to 20 s over 20 h at 14 °C. Lambda ladder PFGE markers (New England BioLabs) served as size standards. Gels were stained with ethidium bromide and visualized using the Vilber Lourmat gel documentation system. Band patterns were analyzed using ImageJ software to confirm the presence of specific replicons.
Genome assembly and annotation
Short reads were quality-checked and trimmed using Fastp25 to remove low-quality bases and adapters. Long reads were processed using Porechop to remove residual adapters and filtered to retain reads with a minimum length of 4000 bp, in accordance with standard practices for long-read data preprocessing26. The genome assembly utilized both short- and long-read data, co-assembled using Unicycler (v0.4.4) with default settings to achieve a highly accurate hybrid assembly27. Gaps and uncertain regions were curated through PCR and subsequent Sanger sequencing of the obtained amplicons. The combined genome coverage was approximately 100×. The quality of the assembled genome was assessed using QUAST (v5.0.2) and completeness and contamination levels were further verified using CheckM (v1.2.2)28,29. The contamination level was determined to be 1.87%, while genome completeness was 99.59%. Initial automated annotation was performed using the Prokaryotic Genome Annotation Pipeline (PGAP), and further manual curation of genes related to virulence and metabolism was done using the MAISEN tool for more precise annotation30,31. Circular genome maps were generated using the GenoVi tool32.
Functional genomic analysis
Functional annotation of genes into Clusters of Orthologous Genes (COG) categories was performed using the GenoVi tool32. To gain deeper insight into the specific functional aspects of the L. lytica genome, clusters of genes responsible for secondary metabolite biosynthesis were analyzed using antiSMASH v7.0.133, with core biosynthetic gene clusters (BGCs) compared across 65 Legionella genomes via BLASTp (≥ 40% identity, ≥ 70% coverage). The resulting data were visualized as a binary heatmap using the ComplexHeatmap R package, with hierarchical clustering applied to species based on BGCs presence/absence profiles.
Antibiotic resistance genes were identified through the Comprehensive Antibiotic Resistance Database (CARD)34. Virulence factors were identified using the VFanalyzer tool, an online interface of the Virulence Factor Database (VFDB) with default settings35. Raw VF counts were normalized using Z-scores to highlight deviations from the mean. Heatmaps of VF categories were generated using ComplexHeatmap, with hierarchical clustering applied to identify genome-level VF profile similarities. Custom Python and R scripts were used for additional data processing and visualization.
Phylogenomics analysis
Phylogenetic relationships were explored using autoMLST with the de novo workflow, which automates the selection of single-copy genes and constructs Maximum-Likelihood species trees from scratch36. Phylogenomics analyses were performed using the complete genome of L. lytica PCM 2298 alongside 65 representative Legionella genomes. Digital DNA-DNA hybridization (dDDH) values were calculated using the TYGS server, employing a threshold of ≤ 70% for species delineation, consistent with established standards3.
Comparative genomics and pangenome analysis
For the comparative genomic analysis, a set of 65 reference genomes of Legionella species was downloaded from RefSeq (Sup. Tab, 1). Pangenome analysis was performed using Roary with the -i flag set to 70 to assess shared and unique genes across the species37. General gene annotations for Legionella genomes were derived from the NCBI PGAP30, while additional functional annotations and metabolic pathways reconstructions were done using eggNOG mapper v2 and the KEGG database for comparative purposes38,39.
To facilitate the visualization of the pangenome, a custom web application was developed using R and the R Shiny library (https://www.R-project.org/, https://CRAN.R-project.org/package=shiny). The app dynamically generates UpSet plots to illustrate the distribution of core and accessory genes across the analyzed genomes. The interactive tool is publicly available at https://koper86.shinyapps.io/roary_pangenome/, allowing users to explore pangenome characteristics in real-time.
Effector protein prediction
The repertoire of potential T4SS effector proteins in L. lytica was predicted using OPT4e, a machine-learning-based tool that employs a Support Vector Machine algorithm. This tool uses statistically selected features, including amino acid composition, sequence motifs, and structural properties, to optimize the identification of effector proteins40.
To further characterize the predicted effector proteins, Pfam domain annotations were extracted from the functional annotations generated by eggNOG-mapper39.
Results
Genome structure of L. lytica
The genome of L. lytica PCM 2298 was sequenced de novo using a combination of short- and long-read sequencing technologies. The total genome size of L. lytica is 3.7 Mbp, with a GC content of 39.5%. The genome consists of a single chromosome of 3,198,081 bp and two plasmids, named pLlyPCM2298_1 (hereafter referred to as p1) and pLlyPCM2298_2 (hereafter referred to as p2), with sizes of 373,394 bp and 136,522 bp, respectively (Fig. 1). The genome structure of L. lytica is consistent with the replicon profiles previously determined by the PFGE separation (Sup. Figure 1—PFGE).
The total number of determined coding sequences (CDSs) was 3165, with 2743 located on the chromosome, 292 on plasmid p1, and 130 on plasmid p2. This corresponds to a gene density of 857 CDSs per 1 Mbp for the chromosome, and 778 and 955 CDSs per 1 Mbp for plasmids p1 and p2, respectively. Of all predicted CDSs, 2,393 were functionally annotated, giving a functional annotation level of 75%. To further improve this value, selected CDSs were manually curated using the MAISEN tool resulting in a final functional annotation level of 80%. Additionally, 12 rRNAs (4 complete chromosomal 16S–5S–23S rRNA clusters), 43 tRNA genes, and 3 ncRNA genes (RNAse P M1 RNA, 6S RNA, and SRP RNA) were identified.
The predicted genes of L. lytica were functionally classified into COG groups. Interestingly, the highest number of genes (320) were classified into the COG M category (cell wall/membrane/envelope biogenesis). Amongst other COGs overrepresented in the genome were COG T (signal transduction mechanisms) with 264 genes, COG J (translation, ribosomal structure, and biogenesis; 258 genes) and categories comprising unknown genes or with general function prediction (COG S and R, respectively; 376 genes in total) (Fig. 2A).
Comparative distribution and frequency of L. lytica PCM 2298 genes across COG categories. (A) The horizontal stacked bar plot represents the number of genes categorized into COG functions for the three replicons: the chromosome and plasmids p1 and p2. The stacked bars indicate the contribution of each replicon to the total number of genes assigned to each COG category. The colors within each bar distinguish the chromosome (darkest shade), plasmid p1, and plasmid p2 (lightest shade). 'Unc.' denotes genes that were not classified into any COG category. The COG categories are grouped into: Cellular processes and signaling: D (Cell cycle control, cell division, chromosome partitioning), M (Cell wall/membrane/envelope biogenesis), N (Cell motility), O (Posttranslational modification, protein turnover, chaperones), T (Signal transduction mechanisms), U (Intracellular trafficking, secretion, and vesicular transport), V (Defense mechanisms), Z (Cytoskeleton); Information storage and processing: J (Translation, ribosomal structure, and biogenesis), K (Transcription), L (Replication, recombination, and repair); Metabolism: C (Energy production and conversion), E (Amino acid transport and metabolism), F (Nucleotide transport and metabolism), G (Carbohydrate transport and metabolism), H (Coenzyme transport and metabolism), I (Lipid transport and metabolism), P (Inorganic ion transport and metabolism), Q (Secondary metabolite biosynthesis, transport, and catabolism); Poorly characterized: R (General function prediction only), S (Function unknown). (B) The heatmap represents the percentage of genes assigned to each COG category for the chromosome, plasmid p1, and plasmid p2. The intensity of color correlates with the relative frequency of genes within each COG category, with darker shades indicating a higher percentage.
The distribution of functional genes (COGs) in plasmid p1 was generally similar to that of the chromosome. The main differences were a higher percentage of genes classified in COG T (10.8% for p1 vs. 6.9% for the chromosome) and a lower content of genes in COG M (6.5% for p1 vs. 9.6% for chromosome), as well as those related to translation and ribosome synthesis (COG J). Interestingly, in plasmid p2, 46 out of 151 genes were associated with DNA recombination and horizontal gene transfer (9.9% in COG L and 7.9% in COG X, respectively), and signal transduction (12.6% in COG T) (Fig. 2B).
The plasmids of L. lytica encode genes related to carbohydrate and coenzyme metabolism (COG G and H groups, respectively), with lower absolute numbers and percentages compared to the chromosome. Additionally, genes from poorly characterized COG groups were identified on the plasmids, with some percentages exceeding those found in the chromosome.
L. lytica is a distinct genomospecies
Historically, the phylogenetic classification of Legionella species has relied heavily on single-gene analyses, particularly of the 16S rRNA gene17,20. In the case of L. lytica, initial classification positioned the species near the Legionella-like amoebal pathogens (LLAP), a result further supported by the bacterium’s poor growth on BCYE medium19,20,41.
In this study, we expanded upon previous work by analyzing the complete genome sequence of L. lytica, enabling a more comprehensive phylogenomic approach. Using the complete genome, we re-evaluated the phylogenetic placement of L. lytica within the Legionella genus by comparing it against the complete genome sequences of 65 reference Legionella strains. In addition, digital DNA-DNA hybridization (dDDH) calculations were performed using the TYGS platform to assess genomic similarity. L. lytica showed the highest dDDH values with L. rowbothamii LLAP6 (51.3%) and L. saoudiensis LH-SWC (42.2%). Based on established thresholds for overall genome relatedness indices (OGRI), with a cutoff of ≤ 70% for species delineation, L. lytica is confirmed as a distinct genomospecies. The phylogenetic tree generated by autoMLST corroborated this finding, placing L. lytica PCM 2298 on the same node as L. rowbothamii LLAP6 and L. saoudiensis LH-SWC (Fig. 3).
Maximum-likelihood phylogenomic tree based on 84 concatenated core genes, constructed using the autoMLST tool. Fifty reference genomes most closely related to L. lytica PCM 2298 were selected based on MASH ANI analysis from a set of 65 genomes analyzed in this study. Pelobacter propionicus was used as an outgroup (OG) to contextualize the diversity within Legionella species. Bootstrap values greater than 70 are indicated at the nodes. The bar represents sequence divergence.
L. lytica enriches the Legionella genus pangenome with more than 380 genes
The pangenome of the Legionella genus was constructed using 65 reference genomes of Legionella spp., along with L. lytica. This analysis revealed a pangenome consisting of 63,051 genes, with 244 classified as core genes (99% ≤ strains ≤ 100%) and 361 as soft core genes (95% ≤ strains < 99%). The majority of genes were categorized as either shell genes (2,930 genes; 15% ≤ strains < 95%) or cloud genes (59,516 genes; 0% ≤ strains < 15%), indicating substantial accessory genome diversity within the Legionella genus (Sup. Tab. 2). L. lytica contributed 383 unique genes, further enriching the Legionella pangenome’s genetic diversity (Sup. Tab. 3).
A custom web application was developed using the R Shiny library to visualize this expanded pangenome, allowing users to interactively explore core and accessory gene profiles through UpSet plots. The tool, available at https://koper86.shinyapps.io/roary_pangenome/, provides a platform for real-time exploration of pangenome characteristics.
Functional annotation reveals specific metabolic traits encoded in L. lytica genome
To gain a deeper understanding of the coding and metabolic potential, the Legionella pangenome was annotated against the KEGG database. While the pangenome analysis based on protein similarity identified numerous unique genes, further annotation using KEGG pathways revealed only a limited number of functions that were specific to L. lytica. Given that KEGG annotation does not encompass all genes, some functions remain undetected in this analysis. Special focus was placed on functions that were identified as rare or specific to L. lytica within the analyzed dataset. Only six such functions were identified in the L. lytica genome (Table 1), with each function’s occurrence across reference genomes listed.
The chromosomal gene J2N86_RS04905 has been annotated as coding for putative D-alanyl-D-alanine dipeptidase (EC 3.4.13.-), a function uncommon within the Legionella genus. According to the BioCyc database, this enzyme plays a critical role in the hydrolysis of the D-alanyl-D-alanine dipeptide, an essential step in peptidoglycan biosynthesis42. Although vancomycin resistance mediated by this enzyme is not relevant for Gram-negative bacteria like L. lytica, due to their outer membrane providing inherent resistance to glycopeptide antibiotics, studies suggest alternative functional roles. For instance, VanX homologs in Gram-negative bacteria, such as E. coli, have been implicated in processes like peptidoglycan recycling during stationary phases or oxidative stress survival. Similarly, we hypothesize that J2N86_RS04905 may contribute to L. lytica’s adaptation to environmental stresses, potentially linked to cellular detoxification or metabolic recycling pathways. Further experimental studies are needed to validate its exact role in L. lytica43.
The next rare gene, J2N86_RS01105, encodes a putative squalene synthase (EC 2.5.1.21), an enzyme involved in the biosynthesis of sesquiterpenoids by catalyzing the conversion of farnesyl pyrophosphate into squalene. This is a remarkable feature, as prokaryotic squalene synthases have been less studied compared to their eukaryotic counterparts. The presence of this enzyme in L. lytica contributes to the metabolic diversity observed within the Legionella genus and may reflect specific ecological or physiological adaptations44.
The gene J2N86_RS04065, located on the chromosome, encodes the predicted enzyme undecaprenyl-diphosphatase BcrC (EC 3.6.1.27). This enzyme, similar to the bacitracin resistance protein BcrC of Bacillus licheniformis, plays a critical role in bacterial resistance to bacitracin by modulating the cell’s sensitivity and resistance through the regulation of the gene expression. BcrC is involved in glycan biosynthesis and metabolism, particularly in peptidoglycan biosynthesis, by controlling the levels of undecaprenyl phosphate, a carrier lipid essential for bacterial cell wall synthesis45.
In addition, two rare genus-wide functions are encoded by the p1 plasmid: putative gamma-tocopherol C-methyltransferase (EC 2.1.1.95) and oxygen-independent coproporphyrinogen III oxidase (EC 1.3.99.22), encoded by genes J2N86_14145 and J2N86_15350, respectively. The first enzyme plays a crucial role in the biosynthesis of vitamin E, specifically facilitating the conversion of gamma-tocopherol to alpha-tocopherol. This step is key in producing the most biologically active form of vitamin E, suggesting that L. lytica may utilize this enzyme in metabolic strategies related to antioxidant protection or interaction with the host46. The putative oxygen-independent coproporphyrinogen III oxidase catalyzes the conversion of coproporphyrinogen III to protoporphyrinogen IX, a critical step in heme biosynthesis. Heme is an essential cofactor for various biological processes, including oxygen transport and electron transfer. The oxygen-independent nature of this enzyme suggests that L. lytica may possess a versatile heme biosynthetic pathway, allowing it to thrive under varying oxygen conditions, potentially contributing to its adaptability and survival in diverse environmental niches or hosts.
Gene clusters for biosynthesis of secondary metabolites and antibiotic resistance are distributed across different parts of the L. lytica genome
Using the antiSMASH tool, several BGCs were identified in the L. lytica genome. The most significant, showing the highest degree of similarity to known biosynthesis clusters, were two gene sets located on the chromosome. The first region, named chr_BGC1 (coordinates 11,842–112,725 bp), contained a trans-AT polyketide synthase, type I polyketide synthase, and non-ribosomal peptide synthetase (NRPS), exhibiting 50% similarity to the legioliulin biosynthetic cluster. Legioliulin is an isocoumarin compound responsible for blue-white autofluorescence in Legionella under long-wavelength UV light47. The second chromosomal region, chr_BGC2 (coordinates 1,476,460–1,499,469 bp), encodes a beta-lactone synthetase, with 13% similarity to known fengycin clusters48.
Moreover, plasmid-borne BCGs were also found, including: p1_BGC1 (a putative cyclodipeptide synthase, CDPS, located at 37,243–57,980 bp on p1), p1_BGC2 (a type III polyketide synthase, T3PKS, at 119,879–161,027 bp on p1), p1_BGC3 (a type I polyketide synthase, T1PKS, at 163,210–211,972 bp on p1), and p2_BGC4 (a combined T1PKS/NRPS cluster at 58,917–107,679 bp on p2).
To get deeper inside into these specific gene clusters, a comparative analysis was conducted by performing a BLASTp search of the core biosynthetic gene cluster genes against 65 Legionella reference genomes. Hits with at least 40% identity and 70% coverage were considered significant. The occurrence of these BGCs across different species was then examined, and the taxonomy of the species harboring these clusters was analyzed. This approach allowed us to determine the distribution of these core BGCs among Legionella species, highlighting both shared and specific biosynthetic capacities, as visualized in Fig. 4. Among the identified BCGs, the chromosomal T1PKS cluster involved in legioliulin biosynthesis, along with the plasmid-borne clusters p_BGC3 (T1PKS) and p_BGC4 (T1PKS/NRPS), were the most commonly found across Legionella species. Notably, these clusters were consistently observed together, suggesting a potential functional or evolutionary linkage. The co-occurrence of these gene clusters across multiple species underscores their likely importance in the metabolic capacity and adaptation of Legionella, particularly in polyketide biosynthesis.
Presence and distribution of biosynthetic gene clusters (BGCs) identified in the L. lytica PCM 2298 genome across various Legionella species and related genera. The heatmap displays the presence (red) or absence (white) of BGCs, including type I polyketide synthase (T1PKS), beta-lactone, CDPS (cyclic dipeptide synthase), and other polyketide synthase clusters (T3PKS, T1PKS), based on a 70% identity threshold of major genes within each cluster. The genomic locations of these BGCs in the L. lytica PCM 2298 genome are shown on the left, along with chromosome and plasmid identifiers and their respective nucleotide positions. The dendrogram above represents the hierarchical clustering of the species based on the similarity of their BGC profiles.
Additionally, a CARD analysis of the L. lytica PCM 2298 genome identified two core antibiotic resistance genes, both classified as class A beta-lactamases (locus tags: J2N86_RS13290 and J2N86_RS01165). Importantly, both genes were chromosomally located and implicated in resistance to beta-lactam antibiotics, a broad class that includes penicillins, cephalosporins, and carbapenems. In general, Legionella are known for their resistance to beta lactams, and this is a typical feature of this bacterial genus49.
Genes encoding virulence factor categories related to antiphagocytosis, chemotaxis and motility, and toxins biosynthesis are more abundant in L. lytica
The virulence potential of L. lytica was assessed by comparing its gene content with the curated and comprehensive database of bacterial virulence factors (VFDB), containing information about the genes encoding well-characterized virulence factors (VFs). This analysis included a comparison of L. lytica PCM 2298 with other reference Legionella species, providing a detailed examination of VF profiles across the genus. Following the analysis, a heatmap was generated to visualize Z-score normalized counts of recognized VFs within specific VF categories, highlighting differences and similarities in VF content across the reference genomes (Fig. 5).
Heatmap of Z-score normalized VF counts across representative Legionella genomes. This heatmap illustrates the distribution of VFs across representative genomes of different Legionella species, analyzed using the VFDB pipeline. Each row corresponds to a distinct VF class, while each column represents a genome. The Z-scores indicate deviations from the mean VF count for each class, with red indicating higher-than-average VF counts, blue indicating lower-than-average counts, and white representing values near the mean. Genomes are clustered based on their VF profiles, with genus and species names displayed in italics at the top of the heatmap. The VF profile of L. lytica is marked with a red asterisk.
The comparison clearly shows that L. lytica encodes all the relevant and characteristic virulence factors typical for the Legionella genus. Diverse surface structures of L. pneumophila, such as lipopolysaccharides (LPS), flagella, pili, major outer membrane protein (MOMP), macrophage infectivity potentiator (MIP), and the 60-kDa heat shock protein (Hsp60), have been recognized to play a potential role in pathogenesis50. Genes encoding all these elements are present in the L. lytica PCM 2298 genome (Sup. Tab. 4).
Notably, L. lytica exhibits significant Z-score deviations in several VF categories compared to other Legionella species, including antiphagocytosis, chemotaxis and motility, and biosynthesis of toxin. In the antiphagocytosis category, L. lytica exhibited a Z-score of 1.5, indicating higher-than-average VF counts (Fig. 5). In this category, the genes wcbC (J2N86_RS13895) and wzm (J2N86_RS13920), potentially involved in extracellular capsule synthesis, were found in only a few species, such as L. longbeachae, L. massiliensis, and L. qingyii (Sup. Tab. 4). Capsule structures are critical for bacterial pathogens, providing a physical barrier against phagocytosis by host immune cells, effectively shielding the bacteria from the host’s primary defense mechanisms. Furthermore, the extracellular capsule can interfere with recognition and response by the host immune system, thereby enhancing the bacteria’s ability to persist and proliferate within the host environment51. To date, L. longbeachae is the only Legionella species in which a capsule has been identified52.
In addition, L. lytica carries the gene J2N86_RS15225 on plasmid p1, which encodes a putative type III hemolysin, a homolog of the hlyIII gene from Bacillus cereus53, resulting in the high Z-score in the toxin biosynthesis category (Fig. 5). While Legionella species are known for hemolytic properties and genes encoding various hemolysins have been found in their genomes, this is the first report of a plasmid-encoded hemolysin gene in the genus 54,55.
Particular attention was given to the genes encoding protein secretion systems identified in Legionella spp. The genomes of L. pneumophila are known to code for various secretion systems, such as the putative type I Lss, type II PilD-dependent Lsp, type IVA Lvh, and type IVB Dot/Icm, while in the case of non-pneumophila strains there is much more variation, with individual secretion systems distributed randomly56. In the case of L. lytica PCM 2298, complete Lsp type II and Lvh type IVA systems were found on the chromosome, as well as both coding regions for the T4BSS Dot/Icm system (Sup. Tab. 4).
Distinct effectors of L. lytica enrich the pathogenic potential of the Legionella genus
The analysis of the L. lytica genome revealed a complete type IVB secretion system (T4BSS) encoded on the chromosome. The T4BSS is known for its role in facilitating intracellular infection by translocating effector proteins into host cells, a critical aspect of Legionella pathogenicity. To further investigate the repertoire of L. lytica's virulence factors, we employed the OPT4e tool.
The OPT4e predictions identified 232 potential effector proteins encoded within the L. lytica PCM 2298 genome (Sup. Tab. 5). Notably, 35 of these effectors are plasmid-encoded, with 30 located on plasmid p1 and five on plasmid p2. Interestingly, 63 of the 232 potential effector proteins encoded in the L. lytica genome were part of the 383 unique gene sets detected in the Legionella pangenome, including 19 effectors encoded on plasmids (Sup. Tab. 3).
Furthermore, our analysis confirmed that L. lytica harbors all seven core effectors identified by Burstein et al., which are conserved across the Legionella genus and play essential roles in intracellular replication and host manipulation. This conservation underscores the functional relevance of these effectors across Legionella species, while the additional distinct effectors identified in L. lytica may contribute to species-specific adaptations and pathogenic traits. These findings align with broader observations from genomic studies, which showed that effector repertoires across Legionella species are largely non-overlapping, with only a few core effectors shared among species17.
To gain a more comprehensive view, protein domain identification analysis was performed using the Pfam database. This analysis revealed the presence of 56 distinct protein domains within the set of effectors analyzed. The most abundantly represented elements were ankyrin repeats, protein kinase domains, and patatin phospholipase (Fig. 6). Interestingly, many of the identified domains are of eukaryotic origin, which is a characteristic feature of effector proteins in the Legionella genus10.
Stacked bar plot of Pfam domains identified within potential effector proteins encoded in the L. lytica PCM 2298 genome. The bars represent the counts of each domain, with “Known” in blue and “Novel” domains in orange. Domains are sorted alphabetically, and only those with at least two counts are included. The “Known” category includes predicted effectors that are shared with other genomes in the analyzed dataset, while the “Novel” category represents effectors from the unique portion of the pangenome, not shared with other Legionella species.
Discussion
Genomic architecture and adaptive potential
The genome structure of L. lytica PCM 2298, with its notably large plasmid p1, represents a unique feature among known Legionella genomes. While L. pneumophila is extensively studied, insights into non-pneumophila species, such as L. lytica, are essential for a more comprehensive understanding of Legionella genomic diversity and adaptability. The analysis of these non-dominant species helps reveal structural variations that may contribute to niche specialization and environmental adaptability.
The genome of L. lytica, with its 3.7 Mb size and 39.5% GC content, falls within the known variability of the Legionella genus, which ranges from 2.37 Mb in L. adelaidensis to 4.88 Mb in L. santicrucis, and GC content spanning from 34.82% in L. busanensis to 50.93% in L. geestiana. This places L. lytica near the midpoint of both genome size and GC content variability, reflecting the diverse genomic architecture seen across the genus17.
The COG distribution in L. lytica, resembles the one for other Legionella genomes and reflects its both host-adapted lifestyle and ecological versatility. According to Hugoson et al., the evolution of Legionella genomes, particularly in COG categories such as cell wall/membrane biogenesis and signal transduction, aligns with their need for structural resilience and environmental responsiveness. This is evident in the substantial representation of COGs involved in cellular processes and signaling57.
Plasmids in Legionella spp. have been shown to carry genes that are essential for adaptation to specific environmental pressures, such as survival in amoebal hosts and resistance to stressors commonly encountered in water systems58. The full significance of plasmid content in Legionella remains underexplored, as the majority of genomic data available pertains to L. pneumophila, with far less known about other species.
Plasmid p1, the largest plasmid identified within the Legionella genus according to currently available genomic data, shows a notable enrichment in COG T (signal transduction mechanisms). This indicates that p1 may be involved in regulatory processes that enhance the bacterium’s ability to respond to environmental stimuli. Such functions are crucial for survival in host-associated or variable environments, where rapid adaptation is necessary59.
In contrast, plasmid p2, although smaller, carries a higher proportion of genes related to COG L (replication, recombination, and repair) and COG X (mobilome elements, such as transposons). This suggests that p2 plays a pivotal role in genetic plasticity, facilitating horizontal gene transfer and genetic rearrangements. The higher representation of these COG categories highlights p2’s potential function in promoting genomic variability and adaptability. This aligns with the concept that smaller plasmids often act as reservoirs for genes that confer adaptive advantages60.
Furthermore, the presence of genes representing poorly characterized COG groups on the plasmids, with some percentages exceeding those found in the chromosome, suggests that plasmids might harbor unique and unexplored functions. These functions could provide adaptive advantages in specific, yet unrecognized, environmental conditions, influencing the bacterium’s phenotypic potential and overall metabolic diversity.
Phylogenomic position and pangenome
The phylogenetic positioning of L. lytica adjacent to L. rowbothamii and L. saoudiensis within cluster IV, as described in Gomez-Valero et al.'s work, highlights its shared evolutionary trajectory with other non-pneumophila species10. This cluster represents a lineage characterized by extensive gene acquisition events, potentially linked to ecological and virulence adaptations. Notably, members of this cluster, such as L. longbeachae, are capable of causing human infections, albeit in specific contexts. For instance, L. longbeachae is frequently associated with Legionnaires’ disease in Australia, where exposure to potted soil serves as a major source of infection1. This sugests that environmental niches play a critical role in shaping the virulence potential and transmission pathways of cluster IV species. The relatively larger genome sizes within this lineage, including L. lytica, suggest an adaptive strategy driven by the retention and integration of horizontally acquired genes, allowing them to exploit specific ecological niches and develop distinct virulence schemes. This evolutionary balance between intracellular specialization and environmental adaptability highlights the genomic plasticity that underpins the survival strategies of Legionella species in diverse ecological and host-associated niches.This observation is further supported by pangenome analysis, which identified 63,051 orthologous gene groups with only 244 core genes and a vast majority classified as shell and cloud genes. This finding is consistent with Burstein et al.'s study, which analyzed 38 Legionella species and found 16,416 clusters of orthologs with a core genome comprising only 1,054 genes17.
The presence of 59,516 cloud genes in Legionella pangenome aligns with the high variability and gene gain observed by Gomez-Valero et al., who documented that more gene acquisition events than losses have driven Legionella evolution, which is clearly exemplified by the presence of larger genomes across different Legionella phylogenetic clades. This pattern indicates that the Legionella genus has evolved to adapt to varied environments through extensive horizontal gene transfer and gene conservation strategies10.
Joseph et al. highlighted that in their analysis, 5832 (32%) of the identified orthologous gene clusters were singletons (strain-specific), further corroborating the notion of an open pangenome with significant gene diversity across species61. The 383 unique genes contributed by L. lytica follow this trend, demonstrating their role in expanding the functional capacity of the genus and enhancing its ecological adaptability.
The KEGG-based functional annotation of L. lytica revealed that, despite contributing 383 unique genes to the Legionella pangenome, only a limited number of unique functional categories were identified. This aligns with Gomez-Valero et al.'s observations that while Legionella species often possess large accessory genomes, their metabolic capabilities can vary significantly, reflecting specific ecological adaptations10. Notably, from the unique genes contributed by L. lytica, 199 were annotated as hypothetical proteins and 16 contained DUF (domain of unknown function) domains, suggesting that many of these singletons have uncharacterized functions (Sup. Tab. 3). This highlights the challenge of fully understanding the functional implications of such gene contributions, as a substantial portion of the accessory genome remains without clear functional classification.
Among these unique contributions, J2N86_RS04905, a putative D-alanyl-D-alanine dipeptidase, is particularly noteworthy. Although its canonical role in vancomycin resistance is irrelevant for Gram-negative bacteria like L. lytica, this gene may have an alternative function, potentially in peptidoglycan recycling or cellular stress responses, as suggested by homologous genes in other bacteria. Such processes could provide adaptive advantages under environmental stress or nutrient-limited conditions43.
Another rare metabolic feature, J2N86_RS01105, encodes a putative squalene synthase, which catalyzes sesquiterpenoid biosynthesis. The presence of this enzyme highlights the metabolic diversity of L. lytica and its potential ecological specialization. While prokaryotic squalene synthases are understudied, their role in L. lytica may reflect a specific adaptation, possibly related to membrane dynamics or stress resistance44.
Plasmid p1 of L. lytica also encodes gamma-tocopherol C-methyltransferase, facilitating the production of the biologically active form of vitamin E. This function might support antioxidant protection or host interactions. Another plasmid-encoded gene, oxygen-independent coproporphyrinogen III oxidase, suggests a versatile heme biosynthetic pathway that may enable L. lytica to thrive under diverse oxygen conditions, potentially contributing to its ecological adaptability46.
Virulence potential of L. lytica
A key aspect of the virulence profile of L. lytica is the wide range of protein secretion systems it possesses. The genome analysis revealed the presence of a complete type IVB Dot/Icm secretion system (T4BSS) on the chromosome, along with complete type II (Lsp) and type IVA (Lvh) secretion systems. These systems are essential for interacting with host cells, translocating effector proteins, and modulating host functions to support bacterial survival and replication.
OPT4e analysis identified 232 potential effector proteins in L. lytica, with 35 encoded on plasmids—30 on plasmid p1 and five on p2. Importantly, 63 of these effector proteins were among the set of 383 unique genes contributed by L. lytica to the Legionella pangenome. This supports findings by Gomez-Valero et al., who noted that effector repertoires in Legionella are highly diverse, with species-specific sets providing functional differentiation within the genus10. Such non-overlapping effector profiles suggest that L. lytica has evolved a pathogenic toolkit, that enhances its adaptability to different environmental and host conditions17.
The identification of 56 distinct protein domains within L. lytica effectors, including ankyrin repeats, protein kinase domains, and patatin phospholipase, reflects its complex interaction strategy with host cells. These domains, many of which are of eukaryotic origin, are known to modulate host cellular processes, further supporting the sophisticated nature of Legionella’s pathogenic mechanisms17. The presence of eukaryotic-like domains is consistent with the genus-wide adaptation to exploit host cell machinery for survival and replication.
Virulence factor analysis also highlighted the presence of key elements such as MOMP, Mip, and the complete set of genes for capsule synthesis, including wcbC and wzm. The capsule-associated genes, found in only a subset of Legionella species like L. longbeachae, suggest enhanced protection against phagocytosis, enabling L. lytica to better evade immune responses52. Additionally, the plasmid-encoded type III hemolysin gene (J2N86_RS15225) could facilitate membrane disruption during host infection, underscoring the contribution of plasmids to the unique virulence arsenal of L. lytica54.
Conclusions
The comprehensive analysis of L. lytica PCM 2298 reveals a genome equipped with both conserved and unique virulence features, underscoring its adaptability and potential for diverse host interactions. The presence of a complete set of secretion systems and a distinct effector repertoire positions L. lytica as a significant contributor to the Legionella genus’s genomic and functional diversity. We hypothesize that the identification of unique genes, particularly those on plasmids, suggests an evolutionary strategy favoring horizontal gene transfer and niche adaptation. Additionally, the development of a custom R Shiny web application provides an interactive platform for exploring the pangenome characteristics of Legionella, facilitating real-time data visualization and analysis. These insights enrich our understanding of Legionella pathogenesis and pave the way for further studies on the ecological roles and infection mechanisms of non-pneumophila species.
Data availability
The genome assembly of Legionella lytica PCM 2298 and associated raw sequencing reads are publicly available under the NCBI BioProject accession number PRJNA708212. The genome assembly can be accessed in GenBank under accession number GCF_023921225.1, and the raw reads are stored in the Sequence Read Archive (SRA) within the same BioProject.
References
Mondino, S. et al. Legionnaires’ disease: State of the art knowledge of pathogenesis mechanisms of Legionella. Annu. Rev. Pathol. 15, 439–466 (2020).
Chauhan, D. & Shames, S. R. Pathogenicity and virulence of Legionella: Intracellular replication and host response. Virulence 12, 1122–1144 (2021).
Meier-Kolthoff, J. P., Carbasse, J. S., Peinado-Olarte, R. L. & Göker, M. TYGS and LPSN: A database tandem for fast and reliable genome-based classification and nomenclature of prokaryotes. Nucleic. Acids. Res. 50, D801–D807 (2022).
Muder, R. R. & Victor, L. Y. Infection due to Legionella species other than L. pneumophila. Clin. Infect. Dis. 35, 990–998 (2002).
Palusińska-Szysz, M. & Drożański, W. J. Pathogenesis and virulence determinants of the family Legionellaceae. Postepy Hig. Med. Dosw. 21 (2006).
Gomez-Valero, L. & Buchrieser, C. Intracellular parasitism, the driving force of evolution of Legionella pneumophila and the genus Legionella. Microb. Infect. 21, 230–236 (2019).
Gomez-Valero, L., Rusniok, C., Cazalet, C. & Buchrieser, C. Comparative and functional genomics of Legionella identified eukaryotic like proteins as key players in host-pathogen interactions. Front. Microbiol. 2, 208 (2011).
Gomez-Valero, L. & Buchrieser, C. Genome dynamics in Legionella: The basis of versatility and adaptation to intracellular Replication. Cold. Spring Harb. Perspect. Med. 3, a009993 (2013).
Bohlin, J., Brynildsrud, O. B., Sekse, C. & Snipen, L. An evolutionary analysis of genome expansion and pathogenicity in Escherichia coli. BMC Genom. 15, 882 (2014).
Gomez-Valero, L. et al. More than 18,000 effectors in the Legionella genus genome provide multiple, independent combinations for replication in human cells. Proc. Natl. Acad. Sci. USA 116, 2265–2273 (2019).
Gomez-Valero, L. et al. Comparative analyses of Legionella species identifies genetic features of strains causing Legionnaires’ disease. Genome Biol. 15, (2014).
Bacigalupe, R., Lindsay, D., Edwards, G. & Fitzgerald, J. R. Population genomics of Legionella longbeachae and hidden complexities of infection source attribution. Emerg. Infect. Dis. 23, 750–757 (2017).
Svetlicic, E. et al. Genomic characterization and assessment of pathogenic potential of Legionella spp. isolates from environmental monitoring. Front. Microbiol. 13, (2023)
diCenzo, G. C. & Finan, T. M. The divided bacterial genome: structure, function, and evolution. Microbiol. Mol. Biol. Rev. 81, (2017).
Miniz, C. S., Fields, B. S. & Zou, C.-H. Isolation and characterization of a conjugative plasmid from Legionella pneumophila. J. Gen. Microbiol. 138, 1379–1386 (1992).
Segal, G., Feldman, M. & Zusman, T. The Icm/Dot type-IV secretion systems of Legionella pneumophila and Coxiella burnetii. FEMS Microbiol. Rev. 29, 65–81 (2005).
Burstein, D. et al. Genomic analysis of 38 Legionella species identifies large and diverse effector repertoires. Nat. Genet. 48, 167–175 (2016).
Lockwood, D. C., Amin, H., Costa, T. R. D. & Schroeder, G. N. The Legionella pneumophila Dot/Icm type IV secretion system and its effectors. Microbiology 168, 001187 (2022).
Drożański, W. J. Sarcobium lyticum gen. nov., sp. Nov., an obligate intracellular bacterial parasite of small free-living amoebae†. Int. J. Syst. Evol. Microbiol. 41, 82–87 (1991).
Hookey, J. V., Saunders, N. A., Fry, N. K., Birtles, R. J. & Harrison, T. G. Phylogeny of Legionellaceae based on small-subunit ribosomal DNA sequences and proposal of Legionella lytica comb. Nov. for Legionella-like amoebal pathogens. Int. J. Syst. Evol. Microbiol. 46, 526–531 (1996).
Huang, S.-W., Hsu, B.-M., Huang, C.-C. & Chen, J.-S. Utilization of polymerase chain reaction and selective media cultivation to identify Legionella in Taiwan spring water samples. Environ. Monit. Assess. 174, 427–437 (2011).
Huang, S.-W. et al. Water quality parameters associated with prevalence of Legionella in hot spring facility water bodies. Water Res. 44, 4805–4811 (2010).
Palusińska-Szysz, M., Choma, A., Russa, R. & Drożański, W. J. Cellular fatty acid composition from Sarcobium lyticum (Legionella lytica comb. nov.)—An intracellular bacterial pathogen of amoebae. Syst Appl Microbiol 24, 507–509 (2001).
Palusińska-Szysz, M., Kalityński, R., Russa, R., Dawidowicz, A. L. & Drożański, W. J. Cellular envelope phospholipids from Legionella lytica. FEMS Microbiol. Lett. 283, 239–246 (2008).
Chen, S. Ultrafast one-pass FASTQ data preprocessing, quality control, and deduplication using fastp. iMeta 2, e107 (2023).
Wick, R. R., Judd, L. M., Gorrie, C. L. & Holt, K. E. Completing bacterial genome assemblies with multiplex MinION sequencing. Microbial. Genom. 3, e000132 (2017).
Wick, R. R., Judd, L. M., Gorrie, C. L. & Holt, K. E. Unicycler: Resolving bacterial genome assemblies from short and long sequencing reads. PLOS Comput. Biol. 13, e1005595 (2017).
Gurevich, A., Saveliev, V., Vyahhi, N. & Tesler, G. QUAST: Quality assessment tool for genome assemblies. Bioinformatics 29, 1072–1075 (2013).
Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 25, 1043–1055 (2015).
Tatusova, T. et al. NCBI prokaryotic genome annotation pipeline. Nucleic Acids. Res. 44, 6614–6624 (2016).
Dziurzynski, M., Decewicz, P., Ciuchcinski, K., Gorecki, A. & Dziewit, L. Simple, reliable, and time-efficient manual annotation of bacterial genomes with MAISEN. Methods Mol. Biol. (Clifton, N.J.) 2242, 221 (2021).
Cumsille, A. et al. GenoVi, an open-source automated circular genome visualizer for bacteria and archaea. PLOS Comput. Biol. 19, e1010998 (2023).
Blin, K. et al. antiSMASH 7.0: New and improved predictions for detection, regulation, chemical structures and visualisation. Nucleic Acids Res. 51, W46–W50 (2023).
Alcock, B. P. et al. CARD 2020: Antibiotic resistome surveillance with the comprehensive antibiotic resistance database. Nucleic Acids Res. 48, D517–D525 (2020).
Chen, L. et al. VFDB: A reference database for bacterial virulence factors. Nucleic Acids Res. 33, D325-328 (2005).
Alanjary, M., Steinke, K. & Ziemert, N. AutoMLST: An automated web server for generating multi-locus species trees highlighting natural product potential. Nucleic Acids Res. 47, W276–W282 (2019).
Page, A. J. et al. Roary: Rapid large-scale prokaryote pan genome analysis. Bioinformatics 31, 3691–3693 (2015).
Kanehisa, M. & Sato, Y. KEGG Mapper for inferring cellular functions from protein sequences. Protein Sci. 29, 28–35 (2020).
Cantalapiedra, C. P., Hernández-Plaza, A., Letunic, I., Bork, P. & Huerta-Cepas, J. eggNOG-mapper v2: Functional Annotation, Orthology Assignments, and Domain Prediction at the Metagenomic Scale. Mol. Biol. Evo. 38, 5825–5829 (2021).
Esna Ashari, Z., Brayton, K. A. & Broschat, S. L. Prediction of T4SS effector proteins for anaplasma phagocytophilum using OPT4e, a new software tool. Front. Microbiol. 10, 1391 (2019).
Khodr, A. et al. Molecular epidemiology, phylogeny and evolution of Legionella. Infect. Genet. Evol. 43, 108–122 (2016).
Karp, P. D. et al. The BioCyc collection of microbial genomes and metabolic pathways. Brief Bioinform. 20, 1085–1093 (2017).
Lessard, I. A. et al. Homologs of the vancomycin resistance D-Ala-D-Ala dipeptidase VanX in Streptomyces toyocaensis, Escherichia coli and Synechocystis: attributes of catalytic efficiency, stereoselectivity and regulation with implications for function. Chem. Biol. 5, 489–504 (1998).
Pan, J.-J. et al. Biosynthesis of squalene from farnesyl diphosphate in bacteria: Three steps catalyzed by three enzymes. ACS Cent. Sci. 1, 77–82 (2015).
Bernard, R., El Ghachi, M., Mengin-Lecreulx, D., Chippaux, M. & Denizot, F. BcrC from Bacillus subtilis acts as an undecaprenyl pyrophosphate phosphatase in bacitracin resistance. J. Biol. Chem. 280, 28852–28857 (2005).
Naguib, M. M. & Valvano, M. A. Vitamin E increases antimicrobial sensitivity by inhibiting bacterial lipocalin antibiotic binding. mSphere 3, e00564-e618 (2018).
Ahrendt, T. et al. Biosynthesis of the natural fluorophore legioliulin from Legionella. Chem. Biochem. 14, 1415–1418 (2013).
Chen, X. H. et al. Comparative analysis of the complete genome sequence of the plant growth-promoting bacterium Bacillus amyloliquefaciens FZB42. Nat. Biotechnol. 25, 1007–1014 (2007).
Pappa, O. et al. Antibiotic resistance of Legionella pneumophila in clinical and water isolates—a systematic review. Int. J. Environ. Res. Public Health 17, 5809 (2020).
Yang, J.-L., Li, D. & Zhan, X.-Y. Concept about the virulence factor of Legionella. Microorganisms 11, 74 (2022).
Merino, S. & Tomás, J. M. Bacterial Capsules and Evasion of Immune Responses. in Encyclopedia of Life Sciences 1–10 (Wiley, 2015). https://doi.org/10.1002/9780470015902.a0000957.pub4.
Cazalet, C. et al. Analysis of the Legionella longbeachae genome and transcriptome uncovers unique strategies to cause legionnaires’ disease. PLOS Genet. 6, e1000851 (2010).
Baida, G. E. & Kuzmin, N. P. Mechanism of action of hemolysin III from Bacillus cereus. Biochim. Biophys. Acta 1284, 122–124 (1996).
Rdest, U., Wintermeyer, E., Ludwig, B. & Hacker, J. Legiolysin, a new hemolysin from L. pneumophila. Zentralbl. Bakteriol. 274, 471–474 (1991).
Flieger, A., Rydzewski, K., Banerji, S., Broich, M. & Heuner, K. Cloning and characterization of the gene encoding the major cell-associated phospholipase A of Legionella pneumophila, plaB, exhibiting hemolytic activity. Infect. Immun. 72, 2648–2658 (2004).
Qin, T., Zhou, H., Ren, H. & Liu, W. Distribution of secretion systems in the genus Legionella and its correlation with pathogenicity. Front. Microbiol. 8, 388 (2017).
Hugoson, E., Guliaev, A., Ammunét, T. & Guy, L. Host adaptation in Legionellales is 1.9 Ga, coincident with eukaryogenesis. Mol. Biol. Evol. 39, msac037 (2022).
Gomez-Alvarez, V., Boczek, L., Raffenberg, I. & Revetta, R. P. Closed genome and plasmid sequences of Legionella pneumophila AW-13–4, isolated from a hot water loop system of a large occupational building. Microbiol. Resour. Announc. https://doi.org/10.1128/mra.01276-20 (2021).
Peña-Montenegro, T. D. et al. Species-specific responses of marine bacteria to environmental perturbation. ISME Commun. 3, 99 (2023).
Wang, B., Finazzo, M. & Artsimovitch, I. Machine learning suggests that small size helps broaden plasmid host range. Genes 14, 2044 (2023).
Joseph, S. J. et al. Dynamics of genome change among Legionella species. Sci. Rep. 6, 33442 (2016).
Funding
This work was supported by the Polish National Science Centre (NCN) through the Miniatura-3 grant (grant number 2019/03/X/NZ2/00976) and the SONATA-18 grant (grant number 2022/47/D/NZ8/00258).
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conception and design. Material preparation, bioinformatics analysis, and manuscript writing were primarily performed by P.K. J.W. contributed to sequencing efforts, while M.K. was responsible for PFGE analysis. K.Ż., P.D., Ł.D., M.P.-S., and A.M. provided consultation and participated in the writing and review of the manuscript. The first draft of the manuscript was written by P.K., and all authors commented on previous versions.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Koper, P., Wysokiński, J., Żebracki, K. et al. Comparative analysis of Legionella lytica genome identifies specific metabolic traits and virulence factors. Sci Rep 15, 5554 (2025). https://doi.org/10.1038/s41598-025-90154-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-025-90154-5