Genomes of Prochlorococcus, Synechococcus, bacteria, and viruses recovered from marine picocyanobacteria cultures based on Illumina and Qitan nanopore sequencing

Wu, Qingtao; Gao, Jie; Sa, Boxuan; Cong, Hongtao; Deng, Wenjie; Zhang, Ying; Zhong, Xiaojie; Zhang, Jinyu; Wang, Liduo; Liu, Haizhou; Yan, Yi; Zhang, Yifei; Liu, Di; Yan, Wei

doi:10.1038/s41597-025-04762-x

Download PDF

Data Descriptor
Open access
Published: 12 April 2025

Genomes of Prochlorococcus, Synechococcus, bacteria, and viruses recovered from marine picocyanobacteria cultures based on Illumina and Qitan nanopore sequencing

Qingtao Wu^1,2,3^na1,
Jie Gao^2,4^na1,
Boxuan Sa¹^na1,
Hongtao Cong^5,6,
Wenjie Deng^1,5,6,
Ying Zhang^2,7,
Xiaojie Zhong^5,6,
Jinyu Zhang¹,
Liduo Wang¹,
Haizhou Liu ORCID: orcid.org/0000-0002-4727-088X²,
Yi Yan^2,4,
Yifei Zhang⁷,
Di Liu^2,3,4,7 &
…
Wei Yan^1,6

Scientific Data volume 12, Article number: 612 (2025) Cite this article

2759 Accesses
8 Altmetric
Metrics details

Subjects

Abstract

Prochlorococcus and Synechococcus are key contributors to marine primary production and play essential roles in global biogeochemical cycles. Despite the ecological importance of these two picocyanobacterial genera, current genomic datasets still lack comprehensive representation of under-sampled ocean regions, associated bacteria and viruses. To address this gap, we used a combination of second- and third-generation sequencing technologies to assemble comprehensive genomic data from 105 Picocyanobacterial enrichment cultures isolated from the Indian Ocean, the South China Sea, and the western Pacific Ocean. This dataset includes 55 Prochlorococcus and 50 Synechococcus genomes with high completeness (>98%) and low contamination (<2%), along with 308 non-redundant associated bacterial genomes derived from 1,457 medium- and high-quality non-cyanobacteria metagenome-assembled genomes (MAGs, completeness ≥50% and contamination ≤10%). Additionally, 2,113 non-redundant viral operational taxonomic units (vOTUs) were derived from a total of 7632 qualified viral contigs. This dataset provides a valuable resource for improving our understanding of the complex interactions among Prochlorococcus, Synechococcus, and their associated bacteria and viruses in marine ecosystems, offering a foundation to study their ecological roles and evolutionary dynamics.

Presence of toxin-antitoxin systems in picocyanobacteria and their ecological implications

Article 19 August 2020

Differential global distribution of marine picocyanobacteria gene clusters reveals distinct niche-related adaptive strategies

Article Open access 25 February 2023

Viruses affect picocyanobacterial abundance and biogeography in the North Pacific Ocean

Article Open access 01 April 2022

Background & Summary

Marine picocyanobacteria in the genera Prochlorococcus and Synechococcus are estimated to account for around 25% of oceanic net primary productivity¹. Prochlorococcus is the smallest (0.5–0.7 μm in diameter) and most abundant photosynthetic organism on Earth, with an estimated global population of approximately 10²⁷ cells². Notably, it possesses the smallest genome of any free-living phototroph, with some strains containing genomes as small as 1.65 Mbp comprising roughly 1,700 genes^3,4. This compact genome reflects its adaptation to oligotrophic, stable environments in the open ocean. However, the genetic diversity within Prochlorococcus populations is extensive, significantly contributing to their ecological resilience⁵. In contrast, Synechococcus has a broader biogeographical distribution, spanning tropical to subpolar regions and extending from coastal waters to the open ocean⁶. The expansive range of this genus is accompanied by a larger genome, which generally ranges from 2.2 to 2.86 Mbp and contains approximately 2,358 to 3,129 genes^7,8. Its enhanced phenotypic plasticity and regulatory versatility enable it to acclimate to diverse and fluctuating environmental conditions⁹. However, despite numerous studies highlighting the ecological significance and genomic characteristics of Prochlorococcus and Synechococcus^10,11,12,13, current genomic datasets still lack comprehensive representation of under-sampled ocean regions, associated bacteria and viruses^14,15,16. These limitations constrain our understanding of the functions and adaptations of these species within global marine ecosystems.

Prochlorococcus and Synechococcus are ecologically significant in marine ecosystems through their symbiotic relationships with heterotrophic bacteria, which help to maintain population stability and ecosystem balance¹⁷. Additionally, both Prochlorococcus and Synechococcus are influenced by viral predation, which affects their population dynamics and genetic diversity¹⁸. Viral lysis releases organic matter and nutrients back into the environment, contributing to nutrient recycling¹⁹. Additionally, it facilitates horizontal gene transfer, enhancing the genetic diversity and adaptability of these picocyanobacteria²⁰. The dynamic interactions among Picocyanobacteria, heterotrophic bacteria, and viruses highlight the complexity of marine microbial ecosystems and their significant role in global biogeochemical cycles. The currently available genomic datasets of associated bacteria and viruses are insufficient in terms of both quantity and representativeness^21,22, which limits our ability to understand the importance of their interactions within ecosystems.

The limitations of second-generation sequencing technologies have contributed to the insufficient quality of current genomic datasets. The use of conventional sequencing methods often results in fragmented assemblies with lower completeness, leading to underestimates of microbial diversity and genomic variability^23,24. Additionally, short-read sequencing often fails to resolve complex genomic regions, resulting in incomplete or biased representations of these genomes and their communities²⁵. To address these limitations, we combined second- and third-generation sequencing to generate genome assemblies of Prochlorococcus, Synechococcus, and associated bacteria and viruses, achieving comprehensive genomic characterisation with fine resolution, rare variant detection, and deeper insights into microbial interactions.

To construct the dataset, we obtained 105 picocyanobacterial enrichment cultures from a total of 27 sampling stations across the Indian Ocean (2022), the South China Sea (2014, 2021), and the western Pacific Ocean (2022) (Fig. 1). All samples were sequenced using second-generation sequencing. Of these, 81 samples were further processed using a hybrid assembly approach that combined second-generation Illumina sequencing and third-generation Qitan nanopore sequencing. Metagenomic binning and refinement identified 55 Prochlorococcus genomes, including 23 high-light clade II (HLII) strains, one HL HLI strain, and 31 low-light clade I (LLI) strains. Additionally, 50 Synechococcus genomes were identified, consisting of 26 clade 5.1B strains, 16 clade 5.1 A strains, and eight clade 5.2 strains, including major subclades such as 5.1B clade I (n = 14), 5.1 A clade II (n = 8), 5.1B clade CRD1 (n = 5), 5.1B clade XVI (n = 3), and 5.1 A clade XV (n = 2). All these genomes exhibited high completeness (>98%) and low contamination (<2%). Of these, 42 genomes were assembled into single contigs (Fig. 2).

A total of 1,457 medium- and high-quality non-cyanobacteria metagenome-assembled genomes (MAGs) (completeness ≥50% and contamination ≤10%) were recovered from which 308 non-redundant MAGs were identified using dRep with a 95% average nucleotide identity (ANI) threshold for dereplication. These 308 medium/high-quality non-cyanobacteria MAGs covered 18 bacterial phyla, including major groups such as Pseudomonadota (n = 142), Bacteroidota (n = 89), Bacillota (n = 25), Actinomycetota (n = 10), and Myxococcota (n = 6) (Fig. 3). From these metagenomic assembly data, 2,113 non-redundant viral operational taxonomic units (vOTUs) were derived from 7,632 qualified viral contigs using a 95% ANI threshold over 85% of the length of the shorter contig. Among these vOTUs, 176 were identified as complete, 470 as high-quality, and 1,201 as medium-quality. Only 11.36% of the vOTUs were classified based on the RefSeqVirus database and Prokaryotic Viral RefSeq (v201) databases, leaving the majority unclassified. A total of 28 viral families were identified, including Mimiviridae (n = 74), Peduovirinae (n = 32), Duneviridae (n = 20), Winoviridae (n = 15), and Casjensviridae (n = 12), as summarised in Table 1.

Table 1 Family-level annotation of viral operational taxonomic units (vOTUs) in samples.

Full size table

Given the critical ecological roles of Prochlorococcus and Synechococcus in marine ecosystems, our dataset that includes these picocyanobacteria genomes as well as those of their associated bacteria and viruses fills significant gaps in current genomic resources. All of the primary reads, MAGs and vContigs have been deposited in the National Center for Biotechnology Information (NCBI) BioProject database and the figshare website. This dataset not only provides a comprehensive foundation for understanding the complex interactions among marine picocyanobacteria, their associated bacteria, and viruses, but also serves as a valuable resource for further studies on microbial ecology, evolution, and biogeochemistry under varying environmental and anthropogenic conditions.

Methods

Sample collection and preparation

Samples were collected from 27 different stations at depths of 25–150 m in the South China Sea (2014, 2021), the Indian Ocean (2022), and the western Pacific Ocean (2022) (Fig. 1). The isolation procedure followed the method described by Yan et al.¹³. In summary, seawater samples were obtained using a Niskin bottle and subjected to gravity filtration through double polycarbonate filters with a pore size of 0.6 μm (Millipore, Billerica, MA, USA), following the protocol described by Chisholm et al.²⁶. The filtrate was then enriched by adding a Pro2 medium nutrient stock solution²⁷ and incubated onboard for an initial enrichment period of 4 to 8 weeks. Flow cytometry was used to count fluorescent cells and primarily confirm the successful enrichment of picocyanobacteria populations. The enrichment cultures were maintained at a temperature of 22 °C under continuous light with an intensity of 10–20 μmol photons m⁻² s⁻¹.

DNA isolation, library preparation, and sequencing

Second-Generation Sequencing Method: Genomic DNA was extracted from 5-mL aliquots of the laboratory cultures using a TIANamp Bacteria DNA Kit (Tiangen, Beijing, China) following the protocol of Yan et al.¹³ followed by centrifugation at 12,000 × g for 30 minutes. The extracted DNA (1 µg) was fragmented using a Covaris ME220 Focused-ultrasonicator (Covaris, Woburn, MA, USA). The DNA library was constructed using the NEBNext® Ultra™ DNA Library Prep Kit for Illumina® (NEB, Ipswich, MA, USA) according to the manufacturer’s instructions. Paired-end sequencing was performed using the Illumina NovaSeq 6000 platform with a read length of 150 bp. Ten nanograms of the prepared library DNA was used for sequencing. All library preparation and sequencing procedures were performed at the Shanghai Majorbio Bio-pharm Technology Co., Ltd. (Shanghai, China).

Third-Generation Sequencing Method: The extracted DNA was processed using the QitanTech DNA Library prep kit QDL-E V1.1 (QitanTech, Chengdu, China), following the manufacturer’s protocol. Library quantification was performed with a Qubit fluorometer using the dsDNA HS Assay Kit, and size distribution was assessed with an Agilent 4200 Bioanalyzer (Agilent Technologies, Palo Alto, CA, USA). Sequencing was carried out on the QNome-3841 nanopore sequencing platform (QitanTech) using the QitanTech DNA Sequencing Kit QDS V1.1.

Metagenomic assembly, gene annotation, and quality control

All metagenomic assemblies were performed on the KBase platform (https://kbase.us), which provided an integrated environment for data processing and analysis²⁸. The Narratives for the assembly processes were as follows: for the standalone second-generation sequencing assembly, refer to Kbase Narrative (https://narrative.kbase.us/narrative/175840); for the third-generation and second-generation hybrid assembly, refer to Kbase Narrative (https://narrative.kbase.us/narrative/186419).

For second-generation sequencing data, raw sequencing reads were initially quality assessed using FastQC (version 0.12.1)²⁹ to identify potential quality issues, including low-quality bases, GC content anomalies, and adapter contamination. Reads were then trimmed and quality-filtered using Trimmomatic (version 0.36)³⁰ with the following parameters: SLIDINGWINDOW:4:15, LEADING:3, TRAILING:3, CROP:140, HEADCROP:5, MINLEN:36. For third-generation sequencing data, quality control was performed using NanoPlot (version 1.32.0) to assess read quality, length distribution, and potential biases. Reads were subsequently filtered to remove low-quality reads and short sequences using NanoFilt (version 2.8.0)³¹ with a minimum quality score of 7 and a minimum read length of 500 bases.

Clean reads from the second-generation sequencing were assembled using metaSPAdes (k-mer sizes: 33, 55, 77, 99, and 127)(v3.15.3)³², with a minimum contig length set to 2000 bases. Clean reads from both second- and third-generation sequencing technologies were assembled into contigs using HybridSPAdes (k-mer sizes: 21, 33, and 55) (v3.15.3)³³, with a minimum contig length set to 2000 bases. Contigs longer than 2 kb were selected for metagenomic binning. Multiple binning tools were used to recover metagenome-assembled genomes (MAGs). Specifically, MetaBAT2 (v1.7)³⁴, MaxBin2 (v2.2.4)³⁵, and CONCOCT (v1.1.0)³⁶ were used for binning, each with the minimum contig length parameter set to 2000 bases. The resulting bins were then consolidated using DAS Tool (v1.1.2)³⁷ to generate a refined set of high-quality bins. The completeness and contamination of MAGs were estimated by running CheckM (v1.0.18)³⁸.

The MAGs meeting the criteria of ≥50% completeness and ≤10% contamination were subsequently clustered using dRep (v3.4.251)³⁹ at the 95% ANI threshold (-sa 0.95 -comp 50 -con 10), resulting in a total of 308 MAGs of associated bacteria. The taxonomy of each MAG was assigned using GTDB-Tk (v2.4.0)⁴⁰ based on the Genome Taxonomy Database (GTDB v220). In addition, MAGs were functionally annotated using RASTtk (v1.073)⁴¹.

Viral contig identification, dereplication, and taxonomic classification

After metagenomic assembly, contigs with lengths ≥10 kb were selected for viral identification. Putative viral contigs were identified using VirSorter2 (v2.2.1)⁴² and VIBRANT (v1.2.1)⁴³, both with default settings. Contigs with a VirSorter2 score ≥0.5 and all viral contigs detected by VIBRANT were retained. The viral contigs identified by these two pipelines were then consolidated for downstream analysis.

Subsequently, the viral contigs were subjected to quality assessment and trimming using CheckV (v0.7.0)⁴⁴. Contigs containing host genes but lacking viral genes were excluded according to the standard operating procedures of VirSorter2. After this quality check, the remaining contigs were clustered using CheckV scripts (https://bitbucket.org/berkeleylab/checkv/src/master/) to generate a non-redundant set of species-level vOTUs. Clustering was based on a 95% ANI threshold over 85% of the length of the shorter contig, following established protocols⁴⁵.

To assign taxonomy to the obtained vOTUs, two pipelines were employed: a protein-sharing network approach using vConTACT2 (v0.9.19)⁴⁶ and a homology-based search using BLASTp (v2.15.0)⁴⁷ alignment against viral sequences in RefSeq at the NCBI. Initially, open reading frames were predicted using Prodigal (v2.6.3)⁴⁸ with the ‘-p meta’ parameter, and the resulting protein sequences were subsequently used for taxonomic analysis. For the protein-sharing network approach, vConTACT2 was run on the KBase platform with default parameters. In parallel, the same set of viral proteins used in vConTACT2 were aligned against the RefSeq viral protein database using BLASTp. A vOTU was assigned to a specific viral family if more than 50% of its proteins matched proteins in that family with a bit-score of ≥50.

Phylogenomic tree reconstruction

One hundred and twenty conserved single-copy genes were extracted from the MAGs using GTDB-Tk (v2.4.0). MUSCLE (v5.1)⁴⁹ was used to align the marker gene sequences extracted from MAGs, and then trimAl (v1.4.1)⁵⁰ was used to prune the alignments. Phylogenomic trees were constructed using IQ-TREE (v2.3.4)⁵¹ with the model (-st AA -m LG + PMSF + G -B 1000–bnni). The confidence of the maximum-likelihood tree was estimated using 1000 bootstrap replicates. In addition, 93 cyanobacterial reference sequences were included in the analysis, as detailed in Supplementary Table S4.

Data Records

Raw reads generated in this study have been deposited in the NCBI Sequence Read Archive under accession number SRP539726⁵². The metagenome-assembled genomes (MAGs) of Prochlorococcus, Synechococcus, and associated bacteria have been deposited in the NCBI BioProject database under accession number PRJNA1175454⁵³. Detailed accession numbers for these MAGs are provided in Supplementary Table S1. The cyanobacterial genomes, Non-cyanobacterial MAGs and vContigs have been deposited on the figshare website (https://figshare.com/projects/Genomes_of_Prochlorococcus_Synechococcus_bacteria_and_viruses_recovered_from_marine_picocyanobacteria_cultures_based_on_Illumina_and_Qitan_nanopore_sequencing/234704)^54,55,56.

Technical Validation

All raw data processing steps, including software and parameters used in this study, were described in the Methods section. The quality of clean reads was assessed using FastQC v0.12.1. The qualities of the MAGs and vContigs were assessed using CheckM v1.0.18 and CheckV v0.7.0, respectively.

Code availability

All versions of third-party software and scripts used in this study are described and referenced accordingly in the Methods sub-sections for ease of access and reproducibility. No custom code was used for the curation and/or validation of the dataset in this study.

References

Cesar-Ribeiro, C., Barbosa, C. S., Terra, V. & Ghisi, N. D. C. Prochlorococcus and Synechococcus marine cyanobacteria: a scientometrics review. Lat Am J Aquat Res. 51, 556–569 (2023).
Article Google Scholar
Flombaum, P. et al. Present and future global distributions of the marine Cyanobacteria Prochlorococcus and Synechococcus. Proc Natl Acad Sci USA. 110, 9824–9829 (2013).
Article ADS PubMed PubMed Central CAS Google Scholar
Rocap, G. et al. Genome divergence in two Prochlorococcus ecotypes reflects oceanic niche differentiation. Nature. 424, 1042–1047 (2003).
Article ADS PubMed CAS Google Scholar
Hao, Z. & Ferdi, L. Genome reduction occurred in early Prochlorococcus with an unusually low effective population size. ISME J. 18, 1–7 (2024).
Google Scholar
Biller, S. J., Berube, P. M., Lindell, D. & Chisholm, S. W. Prochlorococcus: the structure and function of collective diversity. Nat Rev Microbiol. 13, 13–27 (2015).
Article PubMed CAS Google Scholar
Zhang, T. et al. Genomic insights into the adaptation of Synechococcus to the coastal environment on Xiamen. Front. Microbiol. 14, 1292150 (2023).
Article PubMed PubMed Central Google Scholar
Scanlan, D. J. et al. Ecological Genomics of Marine Picocyanobacteria. Microbiol Mol Biol Rev. 73, 249–299 (2009).
Article PubMed PubMed Central CAS Google Scholar
Lee, M. D. et al. Marine Synechococcus isolates representing globally abundant genomic lineages demonstrate a unique evolutionary path of genome reduction without a decrease in GC content. Environ Microbiol. 21, 1677–1686 (2019).
Article PubMed CAS Google Scholar
Mackey, K. R. M. et al. Effect of Temperature on Photosynthesis and Growth in Marine Synechococcus spp. Plant Physiol. 163, 815–829 (2013).
Article PubMed PubMed Central CAS Google Scholar
Sohm, J. A. et al. Co-occurring Synechococcus ecotypes occupy four major oceanic regimes defined by temperature, macronutrients and iron. ISME J. 10, 333–345 (2016).
Article PubMed CAS Google Scholar
Mella-Flores, D. et al. Prochlorococcus and Synechococcus have Evolved Different Adaptive Mechanisms to Cope with Light and UV Stress. Front Microbiol. 3 (2012).
Yan, W., Feng, X., Zhang, W., Zhang, R. & Jiao, N. Research advances on ecotype and sub-ecotype differentiation of Prochlorococcus and its environmental adaptability. Sci China Earth Sci. 63, 1691–1700 (2020).
Article ADS Google Scholar
Yan, W. et al. Diverse Subclade Differentiation Attributed to the Ubiquity of Prochlorococcus High-Light-Adapted Clade II. mBio. 13, e03027-21 (2022).
Article PubMed PubMed Central Google Scholar
Berube, P. M. et al. Single cell genomes of Prochlorococcus, Synechococcus, and sympatric microbes from diverse marine environments. Sci Data. 5, 180154 (2018).
Article PubMed PubMed Central CAS Google Scholar
Yan, W. et al. Genomes of Diverse Isolates of Prochlorococcus High-Light-Adapted Clade II in the Western Pacific Ocean. Front Mar Sci. 7, 619826 (2021).
Article Google Scholar
Biller, S. J. et al. Genomes of diverse isolates of the marine cyanobacterium Prochlorococcus. Sci Data. 1, 140034 (2014).
Article PubMed PubMed Central CAS Google Scholar
Seneviratne, G. & Zavahir, J. S. Role of Microbial Communities for Sustainability (Springer, 2021).
Suttle, C. A. Marine viruses — major players in the global ecosystem. Nat Rev Microbiol. 5, 801–812 (2007).
Article PubMed CAS Google Scholar
Xu, Z. et al. Coevolution between marine Aeromonas and phages reveals temporal trade-off patterns of phage resistance and host population fitness. ISME J. 17, 2200–2209 (2023).
Article PubMed PubMed Central CAS Google Scholar
Sullivan, M. B. et al. Prevalence and Evolution of Core Photosystem II Genes in Marine Cyanobacterial Viruses and Their Hosts. PLoS Biol. 4, e234 (2006).
Article PubMed PubMed Central Google Scholar
Park, H. et al. Uncovering the genomic basis of symbiotic interactions and niche adaptations in freshwater picocyanobacteria. Microbiome. 12, 150 (2024).
Article PubMed PubMed Central CAS Google Scholar
Coello-Camba, A. et al. Picocyanobacteria Community and Cyanophage Infection Responses to Nutrient Enrichment in a Mesocosms Experiment in Oligotrophic Waters. Front. Microbiol. 11, 1153 (2020).
Article PubMed PubMed Central Google Scholar
Berta-Thompson, J. W. et al. Draft genomes of three closely related low light-adapted Prochlorococcus. BMC Genomic Data. 24, 11 (2023).
Article PubMed PubMed Central CAS Google Scholar
Haro-Moreno, J. M., López-Pérez, M. & Rodriguez-Valera, F. Enhanced Recovery of Microbial Genes and Genomes From a Marine Water Column Using Long-Read Metagenomics. Front Microbiol. 12, 708782 (2021).
Article PubMed PubMed Central Google Scholar
Ayling, M., Clark, M. D. & Leggett, R. M. New approaches for metagenome assembly with short reads. Brief Bioinform. 21, 584–594 (2020).
Article PubMed CAS Google Scholar
Chisholm, S. W. et al. Prochlorococcus marinus nov. gen. nov. sp.: an oxyphototrophic marine prokaryote containing divinyl chlorophyll a and b. Arch Microbio. 157, 297–300 (1992).
Article CAS Google Scholar
Moore, L. R. et al. Culturing the marine cyanobacterium Prochlorococcus. Limnol Oceanogr-Meth. 5, 353–362 (2007).
Article CAS Google Scholar
Arkin, A. P. et al. KBase: The United States Department of Energy Systems Biology Knowledgebase. Nat Biotechnol. 36, 566–569 (2018).
Article PubMed PubMed Central CAS Google Scholar
Brown, J., Pirrung, M. & McCue, L. A. FQC Dashboard: integrates FastQC results into a web-based, interactive, and extensible FASTQ quality control tool. Bioinformatics. 33, 3137–3139 (2017).
Article PubMed PubMed Central CAS Google Scholar
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 30, 2114–2120 (2014).
Article PubMed PubMed Central CAS Google Scholar
De Coster, W., D’Hert, S., Schultz, D. T., Cruts, M. & Van Broeckhoven, C. NanoPack: visualizing and processing long-read sequencing data. Bioinformatics. 34, 2666–2669 (2018).
Article PubMed PubMed Central Google Scholar
Nurk, S., Meleshko, D., Korobeynikov, A. & Pevzner, P. A. metaSPAdes: a new versatile metagenomic assembler. Genome Res. 27, 824–834 (2017).
Article PubMed PubMed Central CAS Google Scholar
Antipov, D., Korobeynikov, A., McLean, J. S. & Pevzner, P. A. hybrid SPA des: an algorithm for hybrid assembly of short and long reads. Bioinformatics. 32, 1009–1015 (2016).
Article PubMed CAS Google Scholar
Kang, D. D. et al. MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies. PeerJ. 7, e7359 (2019).
Article PubMed PubMed Central Google Scholar
Wu, Y. W., Simmons, B. A. & Singer, S. W. MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics. 32, 605–607 (2016).
Article PubMed CAS Google Scholar
Alneberg, J. et al. Binning metagenomic contigs by coverage and composition. Nat Methods. 11, 1144–1146 (2014).
Article PubMed CAS Google Scholar
Sieber, C. M. K. et al. Recovery of genomes from metagenomes via a dereplication, aggregation and scoring strategy. Nat Microbiol. 3, 836–843 (2018).
Article PubMed PubMed Central CAS Google Scholar
Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 25, 1043–1055 (2015).
Article PubMed PubMed Central CAS Google Scholar
Olm, M. R., Brown, C. T., Brooks, B. & Banfield, J. F. dRep: a tool for fast and accurate genomic comparisons that enables improved genome recovery from metagenomes through de-replication. ISME J. 11, 2864–2868 (2017).
Article PubMed PubMed Central CAS Google Scholar
Chaumeil, P.-A., Mussig, A. J., Hugenholtz, P. & Parks, D. H. GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database. Bioinformatics. 36, 1925–1927 (2020).
Article CAS Google Scholar
Brettin, T. et al. RASTtk: A modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes. Sci Rep. 5, 8365 (2015).
Article PubMed PubMed Central Google Scholar
Guo, J. et al. VirSorter2: a multi-classifier, expert-guided approach to detect diverse DNA and RNA viruses. Microbiome. 9, 37 (2021).
Article PubMed PubMed Central Google Scholar
Kieft, K., Zhou, Z. & Anantharaman, K. VIBRANT: automated recovery, annotation and curation of microbial viruses, and evaluation of viral community function from genomic sequences. Microbiome. 8, 90 (2020).
Article PubMed PubMed Central CAS Google Scholar
Nayfach, S. et al. CheckV assesses the quality and completeness of metagenome-assembled viral genomes. Nat Biotechnol. 39, 578–585 (2021).
Article PubMed CAS Google Scholar
Liu, Y. Diversity and function of mountain and polar supraglacial DNA viruses. Sci Bull. 68, 2418–2433 (2023).
Article ADS CAS Google Scholar
Jang, H. B. et al. Gene sharing networks to automate genome-based prokaryotic viral taxonomy. Nat biotechnol. 37, 632–639 (2019).
Article Google Scholar
Ye, J., McGinnis, S. & Madden, T. L. BLAST: improvements for better sequence analysis. Nucleic Acids Res. 34, W6–W9 (2006).
Article PubMed PubMed Central CAS Google Scholar
Hyatt, D. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 11, 119 (2010).
Article PubMed PubMed Central Google Scholar
Edgar, R. C. Muscle5: High-accuracy alignment ensembles enable unbiased assessments of sequence homology and phylogeny. Nat Commun. 13, 6968 (2022).
Article ADS PubMed PubMed Central CAS Google Scholar
Capella-Gutiérrez, S., Silla-Martínez, J. M. & Gabaldón, T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics. 25, 1972–1973 (2009).
Article PubMed PubMed Central Google Scholar
Minh, B. Q. et al. IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era. Mol Biol and Evol. 37, 1530–1534 (2020).
Article CAS Google Scholar
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRP539726 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/bioproject:PRJNA1175454 (2025).
Wu, Q. et al. Genomes of picocyanobacteria Recovered from marine picocyanobacteria cultures. figshare. https://doi.org/10.6084/m9.figshare.28246127.v1 (2025).
Wu, Q. et al. Genomes of Bacteria Recovered from marine picocyanobacteria cultures. figshare. https://doi.org/10.6084/m9.figshare.28235138.v1 (2025).
Wu, Q. et al. Viral Contigs Identified from marine picocyanobacteria cultures. figshare. https://doi.org/10.6084/m9.figshare.28235789.v1 (2025).

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grants 42188102 and 42293292). This paper contributes to the science plan of the Ocean Negative Carbon Emissions (ONCE) Program.We thank the core facility and technical support at Wuhan Institute of Virology for assistance with DNA extraction, sequencing and analysis. We would also like to express our gratitude to all the members who conducted field sampling, whose efforts were crucial to the success of this study. Samples from the South China Sea were collected onboard of R/V TAN KAH KEE implementing the open research cruise NORC2021-06 supported by the NSFC Shiptime Sharing Project (Grant No. 42049906). Samples from the Indian Ocean were collected onboard of R/V Shiyan 6 implementing the open research cruise NORC2022-10 + NORC2022-303 supported by the NSFC Shiptime Sharing Projects (Grant No. 42149910). Samples from the western Pacific Ocean were collected onboard of R/V KeXue implementing the open research cruise NORC2021-09 supported by the NSFC Shiptime Sharing Project (Grant No. 42049909).

Author information

These authors contributed equally: Qingtao Wu, Jie Gao, Boxuan Sa.

Authors and Affiliations

College of Marine Science and Technology, China University of Geosciences, Wuhan, 430074, China
Qingtao Wu, Boxuan Sa, Wenjie Deng, Jinyu Zhang, Liduo Wang & Wei Yan
Computational Virology Group, Etiology Research Center, Wuhan Institute of Virology, Chinese Academy of Sciences, Wuhan, 430071, China
Qingtao Wu, Jie Gao, Ying Zhang, Haizhou Liu, Yi Yan & Di Liu
College of Animal Science and Technology, Guangxi University, Nanning, 530004, China
Qingtao Wu & Di Liu
University of Chinese Academy of Sciences, Beijing, 101408, China
Jie Gao, Yi Yan & Di Liu
State Key Laboratory of Marine Environmental Science, College of Ocean and Earth Sciences, Xiamen University, Xiamen, 361102, China
Hongtao Cong, Wenjie Deng & Xiaojie Zhong
Carbon Neutral Innovation Research Center, Xiamen University, Global ONCE Program, Xiamen, 361005, China
Hongtao Cong, Wenjie Deng, Xiaojie Zhong & Wei Yan
School of Life and Health Science, Hunan University of Science and Technology, Xiangtan, 411201, China
Ying Zhang, Yifei Zhang & Di Liu

Authors

Qingtao Wu
View author publications
Search author on:PubMed Google Scholar
Jie Gao
View author publications
Search author on:PubMed Google Scholar
Boxuan Sa
View author publications
Search author on:PubMed Google Scholar
Hongtao Cong
View author publications
Search author on:PubMed Google Scholar
Wenjie Deng
View author publications
Search author on:PubMed Google Scholar
Ying Zhang
View author publications
Search author on:PubMed Google Scholar
Xiaojie Zhong
View author publications
Search author on:PubMed Google Scholar
Jinyu Zhang
View author publications
Search author on:PubMed Google Scholar
Liduo Wang
View author publications
Search author on:PubMed Google Scholar
Haizhou Liu
View author publications
Search author on:PubMed Google Scholar
Yi Yan
View author publications
Search author on:PubMed Google Scholar
Yifei Zhang
View author publications
Search author on:PubMed Google Scholar
Di Liu
View author publications
Search author on:PubMed Google Scholar
Wei Yan
View author publications
Search author on:PubMed Google Scholar

Contributions

W.Y. and D.L. conceived this study. B.S., W.D., J.G. J.Z. and Y.Z. were responsible for the cultivation of picocyanobacteria and DNA extraction. Q.W., J.G., B.S., H.C., W.D., X.Z. and L.W. performed metagenomic data analysis. Q.W. wrote the first draft under the supervision of W.Y. and D.L. W.Y., D.L. YF.Z. H.L. and Y.Y. revised the draft. All authors reviewed and contributed to the final version of the manuscript.

Corresponding authors

Correspondence to Di Liu or Wei Yan.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Table_S2.xlsx

Table_S3.xlsx

Table_S4.xlsx

Table_S1.xlsx

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Wu, Q., Gao, J., Sa, B. et al. Genomes of Prochlorococcus, Synechococcus, bacteria, and viruses recovered from marine picocyanobacteria cultures based on Illumina and Qitan nanopore sequencing. Sci Data 12, 612 (2025). https://doi.org/10.1038/s41597-025-04762-x

Download citation

Received: 13 November 2024
Accepted: 05 March 2025
Published: 12 April 2025
DOI: https://doi.org/10.1038/s41597-025-04762-x