ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Genome Note

A new genome sequence resource for five invasive fruit flies of agricultural concern: Ceratitis capitata, C. quilicii, C. rosa, Zeugodacus cucurbitae and Bactrocera zonata (Diptera, Tephritidae)

[version 1; peer review: 2 approved]
PUBLISHED 06 Dec 2024
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

This article is included in the Agriculture, Food and Nutrition gateway.

Abstract

Here, we present novel high quality genome assemblies for five invasive tephritid species of agricultural concern: Ceratitis capitata, C. quilicii, C. rosa, Zeugodacus cucurbitae and Bactrocera zonata (read depths between 65 and 78x). Three assemblies (C. capitata, C. quilicii and Z. cucurbitae) were scaffolded with chromosome conformation data and annotated using RNAseq reads. For some species this is the first reference genome available (B. zonata, C. quilicii and C. rosa), for others we have published improved annotated genomes (C. capitata and Z. cucurbitae). Together, the new references provide an important resource to advance research on genetic techniques for population control, develop rapid species identification methods, and explore eco-evolutionary studies.

Keywords

genome assembly, invasive species, fruit fly, tephritidae, pest

Introduction

A significant number of phytophagous insects within the dipteran family of the Tephritidae (the “true” fruit flies) are considered as serious pests for fruits and vegetables worldwide (White & Elson-Harris 1992). Globalization has led to a surge in intercontinental trade and movement, and has increased the number of incursions of harmful non-native fruit fly species (Bragard et al. 2020). Many countries have put costly and elaborate phytosanitary measures in place to prevent entry and establishment of harmful fruit fly species (Bragard et al. 2020; Papadopoulos et al. 2023a, 2023b). Making resources available that could provide researchers with a better tool for studying fruit fly pests is becoming increasingly important. Agricultural areas with a suitable climate for fruit fly pests are rapidly increasing around the globe (Sultana et al. 2020), changing patterns of distribution of fruit fly pests (Ni et al. 2011). This leads to more fruit fly incursions and first detections of new fruit fly species in several countries in recent years, e.g. B. dorsalis in France, Italy and Belgium; B. zonata in France (EPPO alert list, https://www.eppo.int/ACTIVITIES/plant_quarantine/alert_list).

Here, we present high quality reference genome assemblies for five tephritids (Ceratitis capitata (Wiedemann), C. quilicii (De Meyer, Mwatawala & Virgilio), C. rosa (Karsch), Zeugodacus cucurbitae (Coquillett), Bactrocera zonata) of agricultural importance (Figure 1a). For three (C. quilicii, C. rosa, B. zonata) of the five species, a genome assembly is completely lacking in public databases and could thus provide a major step forward in accumulating knowledge on those species. Genome assemblies are a valuable resource for both fundamental and applied research and can facilitate the development of new and sustainable pest management methods. The highly contiguous and complete genomes presented here will increase the chances of researchers to find specific genes of interest and investigate changes in genomic architecture. The new assemblies will enable researchers to tackle questions regarding climate adaptation, host and range expansion and niche shifts (Papanicolaou et al. 2016).

405e2450-033a-4b80-9461-a9fc4839d4c1_figure1.gif

Figure 1. Linkage and BUSCO completeness across the genomes presented here and phylogenetic analysis.

(a) Photographs of the five fruit fly pest species from a dorsal and lateral view © RMCA (Royal Museum for Central Africa. (b) Hi-C (Dovetail™ Omni-C™) contact map for three tephritid species showing which reads are in close proximity of each other, revealing the linear representation of the scaffolds/chromosomes within the genome. (c) Phylogenetic tree of the three tephritid fruit flies with annotation and five other diptera species. (d) BUSCO completeness results for each of the assembled tephritid genomes.

Results and discussion

PacBio CSS reads covered the genome between 65 and 78 times assuming a genome size of 0.5 Gb ( Table 1) for the five fruit fly species shown in Figure 1a. A BUSCO search for genome completeness for all five novel assemblies against the Diptera database delivered a decent genome completeness between 94.6% (B. zonata) and 98.8% (C. capitata) using the duplicate purged PacBio assemblies (Figure 1d). Total assembly lengths ranged from 410 Mb (Z. cucurbitae) to 889 Mb (C. quilicii) with L50 values ranging from three (B. zonata) to 63 (C. quilicii) ( Table 1). BlobToolKit results for identifying contaminants are shown in Figure. S1-S5 (Refer extended data) accessible at https://zenodo.org/records/14186560). Physical pairing between chromatin regions is shown in Figure 1b for C. capitata, C. quilicii and Z. cucurbitae.

Table 1. Comparison of assembly statistics.

PacBio CSS read datagenome length and contiguity
Species Number of Reads Bp (Gb) Coverage Total Length (bp) N50 L50 N90 L90
Ceratitis capitata 2,563,88338.878x699,814,2898,193,44025817,442121
Ceratitis quilicii 2,498,50538.477x889,108,3704,088,37463278,990378
Ceratitis rosa 2,451,13336.272x650,940,38917,660,86712966,43575
Zeugodacus cucurbitae 2,332,13132.465x410,169,93214,989,67984,412,78928
Bactrocera zonata 2,522,87632.465x524,894,62999,542,525322,789,7296

The annotated genomes comprise 32,449; 38,590 and 31,422 genes in total for C. capitata, C. quilicii and Z. cucurbitae respectively with a total coding region length (bp) of 39,037,294; 46,768,995 and 41,286,253. The average gene length (bp) is 1,203.04; 1,211.95 and 1,313.93 for C. capitata, C. quilicii and Z. cucurbitae respectively. The most recent C. capitata assembly available on NCBI (GCA_905071925.1, published in November 2020) contains 14,054 genes and thus, this novel assembly improves the degree of annotation of the C. capitata genome significantly. The same can be observed in Z. cucurbitae, where the most recent NCBI reference assembly (GCF_028554725.1) only comprises 17,225 genes. In Ceratitis sp. however, a substantial proportion of BUSCO’s are duplicated, which suggest the presence of redundant sequences resulting from partial misassemblies. Our recommendation is therefore to be cautious when comparing Ceratitis sp. assemblies with other assemblies.

A total of 19,480 gene orthogroups could be found using OrthoFinder and a total of 32,051; 37,950 and 31,009 genes could be attributed to an orthogroup for C. capitata, C. quilicii and Z. cucurbitae respectively. Using these orthogroups as evidence we estimated that the Tephritidae-Drosophilidae split took place around 120 MYA (Figure 1c), which is in line with the estimations of Russo et al. (2013) who constructed a drosophilid time tree with two tephritid species as outgroup (C. capitata and B. oleae) and estimated the split at around 110 MYA.

We believe that our contribution will substantially impact tephritid genome research and provides new opportunities for comparative genomics with a focus on characterizing genes related to invasiveness.

Methods

De novo genome assembly

An inbred lab colony of each of the following tephritid species was established in an artificial setting and larvae were collected for subsequent sequencing: Ceratitis capitata, C. quilicii, C. rosa, Zeugodacus cucurbitae and Bactrocera zonata. Inbred specimens of C. quilicii, C. capitata and C. rosa were produced at Citrus Research International in Mbombela and were originally sourced from wild flies collected in Ermelo (-26.516021, 29.996168), Burgershall (-25.112083, 31.087778) and Mbombela (-25.452258, 30.970778), Mpumalanga Province, South Africa respectively in 2020 (C. rosa) and 2021 (C. capitata and C. quilicii). Species identity was confirmed by Marc De Meyer (C. quilicii) and Aruna Manrakhan (C. capitata and C. rosa). Inbred lines for Z. cucurbitae and B. zonata were already present at the facilities of CIRAD, Réunion for more than 150 generations and could thus be used for our purposes. Pupae of all species supplied for sequencing originate from a parent x F1 backcross to increase homozygosity. The sequencing and assembly process can be described by three consecutive steps: generation of PacBio CCS reads and primary assembly with Hifiasm, generation of Hi-C (specifically, Dovetail™ Omni-C™ reads) coupled with secondary assembly using HiRise and lastly, generation of an RNAseq library for ab initio genome annotation. Only the assemblies of C. capitata, C. quilicii and Z. cucurbitae comprised the HiRise scaffolding and annotation steps.

De novo PacBio assembly and filtering

A de novo assembly was constructed using ±38.8 Gb of PacBio CCS reads resulting in a coverage of around 70x of the tephritid genome ( Table 1). The obtained PacBio reads were used as input to Hifiasm v0.15.4-r347 (Cheng et al. 2021) with default parameters. Blast results of the Hifiasm output assembly against the nucleotide BLAST database (https://blast.ncbi.nlm.nih.gov/) were used as input for blobtools v1.1.1 (Laetsch and Blaxter 2017) and scaffolds identified as possible contamination were removed from the assembly. Finally, purge_dups3 v1.2.5 (Guan et al. 2020) was used to purge haplotigs and contig overlaps. The final assembly was checked for its completeness using BUSCO using the diptera_odb10 dataset (Manni et al. 2021).

Chromosome conformation capture and HiRise scaffolding

To construct a Dovetail™ Omni-C™ library, chromatin was fixed in place with formaldehyde in the nucleus and then extracted. Fixed chromatin was digested with DNAse I, chromatin ends were repaired and ligated to a biotinylated bridge adapter followed by proximity ligation of adapter containing ends. After proximity ligation, crosslinks were reversed and the DNA purified. Purified DNA was treated to remove biotin that was not internal to ligated fragments. Sequencing libraries were generated using NEBNext Ultra enzymes and Illumina-compatible adapters. Biotin-containing fragments were isolated using streptavidin beads before PCR enrichment of each library. The library was sequenced on an Illumina HiSeqX platform to produce approximately 30x sequence coverage.

The input de novo assembly and Dovetail™ Omni-C™ library reads (MQ > 50) were used as input data for HiRise, a software pipeline designed specifically for using proximity ligation data to scaffold genome assemblies (Putnam et al. 2016). Dovetail™ Omni-C™ library sequences were aligned to the draft input assembly using bwa (https://github.com/lh3/bwa). The separations of Dovetail™ Omni-C™ read pairs mapped within draft scaffolds were analyzed by HiRise to produce a likelihood model for genomic distance between read pairs, and the model was used to identify and break putative misjoins, to score prospective joins, and make joins above a threshold.

Ab initio genome annotation

Firstly, repeat families in the three tephritid genome assemblies (C. capitata, C. quilicii and Z. cucurbitae) were identified de novo and classified using the software package RepeatModeler2 (Flynn et al. 2020, the original version of RepeatModeler is free and available at https://github.com/Dfam-consortium/RepeatModeler/blob/master/RepeatModeler). The custom repeat library obtained from RepeatModeler2 was used to discover, identify and mask the repeats in the assembly using RepeatMasker (Version 4.1.0, available at https://github.com/rmhubley/RepeatMasker). Secondly, coding sequences from Bactrocera dorsalis, Ceratitis capitata and Drosophila melanogaster available on GenBank were used to train the ab initio model in AUGUSTUS (version 2.5.5) by performing six rounds of optimization. Likewise, the same coding sequences were used to train an independent ab initio gene model using SNAP (Korf 2004). Furthermore, RNAseq reads were mapped onto the genome using the STAR aligner software (Dobin et al. 2013). MAKER (Campbell et al. 2014), SNAP and AUGUSTUS (with intron-exon boundary hints provided from RNAseq) were then used to predict genes in the repeat-masked reference genome. To help guide the prediction process, SwissProt peptide sequences from the UniProt database (https://www.uniprot.org/) were downloaded and used in conjunction with the protein sequences from the aforementioned species to generate peptide evidence in the Maker pipeline (Campbell et al. 2014). Only genes that were predicted by both SNAP and AUGUSTUS were retained in the final gene sets. To help assess the quality of the gene prediction, AED scores were generated for each of the predicted genes as part of the MAKER pipeline. Genes were further characterised for their putative function by performing a BLAST (Ye et al. 2006) search of the peptide sequences against the UniProt database. tRNA were predicted using the software tRNAscan-SE (Lowe & Chan 2016, available at: https://lowelab.ucsc.edu/tRNAscan-SE/).

Phylogenetic tree reconstruction

We inferred orthogroups using OrthoFinder v2.5.5. (Emms & Kelly 2019) for the three fruit fly species with an annotated genome assembly in this study (C. capitata, C. quilicii and Z. cucurbitae). In addition, we downloaded protein sequence data for Drosophila melanogaster Meigen (GCA_000001215.4), Anopheles darlingi Root (GCA_000211455.3), Musca domestica Linnaeus (GCF_030504385.1), Rhagoletis pomonella (Walsh) (GCF_013731165.1) and Bactrocera tryoni (Froggatt) (GCF_016617805.1). Sequences were aligned using Diamond and gene trees were inferred using fasttree. The STAG algorithm combined with the STRIDE rooting methods, implemented in OrthoFinder, was then used to infer a species tree with realistic branch lengths from the full set of gene trees (Emms & Kelly 2017). A time-calibrated tree was constructed by transforming the species tree rendered by Orthofinder into a ultrametric tree and calibrating it based on the split between A. darlingi and the rest of the taxa (240.8 MYA) as inferred from TIMETREE5 (timetree.org).

Author contributions

PD, SV, LE, MDM, MV (RMCA, BE) – Conceptualization, funding acquisition, original draft preparation and data submission.

PA, JT, MK (SU, ZA) – Conceptualization, development of the inbred lines, provision of field samples, review and editing.

AM (CRI, ZA) - Conceptualization, development of the inbred lines, provision of field samples, review and editing.

DC, LC (EMU, MZ), LB (National FF lab, MZ) - Conceptualization, provision of field samples, review and editing.

MM, RM, AK, JT (SUA, TZ), JB (UDOM, TZ) - Conceptualization, review and editing.

HD (CIRAD – La Réunion, FR) - Conceptualization, funding acquisition, development of the inbred lines, provision of field samples, review and editing.

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 06 Dec 2024
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Deschepper P, Vanbergen S, Esselens L et al. A new genome sequence resource for five invasive fruit flies of agricultural concern: Ceratitis capitata, C. quilicii, C. rosa, Zeugodacus cucurbitae and Bactrocera zonata (Diptera, Tephritidae) [version 1; peer review: 2 approved]. F1000Research 2024, 13:1492 (https://doi.org/10.12688/f1000research.157946.1)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 1
VERSION 1
PUBLISHED 06 Dec 2024
Views
3
Cite
Reviewer Report 28 Feb 2025
Lucio Navarro Escalante, The University of Texas at Austin, Austin, USA 
Approved
VIEWS 3
Report:
The manuscript reports the sequencing and assembly for the genomes of five tephritid fly species of agricultural importance (Ceratitis capitata, .C. quilicii, C. rosa, Bactrocera zonata and Zeugodacus cucurbitae). PacBio sequences were used to assembly contig level high ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Escalante LN. Reviewer Report For: A new genome sequence resource for five invasive fruit flies of agricultural concern: Ceratitis capitata, C. quilicii, C. rosa, Zeugodacus cucurbitae and Bactrocera zonata (Diptera, Tephritidae) [version 1; peer review: 2 approved]. F1000Research 2024, 13:1492 (https://doi.org/10.5256/f1000research.173473.r365152)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Views
4
Cite
Reviewer Report 26 Feb 2025
Craig Wilding, Liverpool John Moores University, Liverpool, England, UK 
Approved
VIEWS 4
Deschepper et al. describe the production of high-quality chromosomal assemblies for each of five invasive fruit flies. This is an excellent body of work comprising not just PacBio sequencing, but Hi-C data.
The methods used are both clearly described ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Wilding C. Reviewer Report For: A new genome sequence resource for five invasive fruit flies of agricultural concern: Ceratitis capitata, C. quilicii, C. rosa, Zeugodacus cucurbitae and Bactrocera zonata (Diptera, Tephritidae) [version 1; peer review: 2 approved]. F1000Research 2024, 13:1492 (https://doi.org/10.5256/f1000research.173473.r365150)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 06 Dec 2024
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.