Genome assembly at chromosome scale with telomere ends for Pearlspot, Etroplus suratensis

Katneni, Vinaya Kumar; Krishnan, Karthic; Prabhudas, Sudheesh K; Jayaraman, Roja; Quraishi, Nida; Vasagam, Kumaraguru; Jangam, Ashok Kumar; Angel, Jesudhas Raymond Jani; Kaikkolante, Nimisha; Jayaraman, Kumaravel; Mudagandur, S Shekhar

doi:10.1038/s41597-024-04096-0

Download PDF

Data Descriptor
Open access
Published: 13 November 2024

Genome assembly at chromosome scale with telomere ends for Pearlspot, Etroplus suratensis

Vinaya Kumar Katneni ORCID: orcid.org/0000-0002-3112-0465¹,
Karthic Krishnan¹,
Sudheesh K Prabhudas ORCID: orcid.org/0000-0002-8888-7157¹,
Roja Jayaraman¹,
Nida Quraishi¹,
Kumaraguru Vasagam²,
Ashok Kumar Jangam¹,
Jesudhas Raymond Jani Angel³,
Nimisha Kaikkolante¹,
Kumaravel Jayaraman² &
…
S Shekhar Mudagandur⁴

Scientific Data volume 11, Article number: 1226 (2024) Cite this article

2115 Accesses
3 Altmetric
Metrics details

Subjects

Abstract

The pearlspot, Etroplus suratensis is a climate resilient cichlid fish that exhibits unusual adaptation to salinity. The fish is able to complete full life cycle in diverse salinity habitats ranging from fresh water to marine environments. High-quality primary and phased genome assemblies were generated for pearlspot fish using PacBio HiFi and Arima HiC sequencing technologies, for the first time. The primary assembly is highly contiguous with contig N50 length of 36 Mb. The final assembly is of 1.247 Gb with N50 length of 51.57 Mb and 98% of the genome length anchored to 24 chromosomes. The genome was assessed to be 99.9% complete based on BUSCO evaluation and was predicted to contain 52.96% repeat elements. We have predicted 27,192 protein encoding genes, of which 21,580 were functionally annotated. The genome offers an invaluable resource to understand adaptation of pearlspot fish to diverse salinity habitats.

Chromosome-level genome assembly and annotation of the Antarctica whitefin plunderfish Pogonophryne albipinna

Article Open access 12 December 2023

A chromosome-level genome assembly and annotation of the Pseudorasbora elongata (Cypriniformes: Cyprinidae)

Article Open access 01 April 2025

Telomere-to-telomere gapless genome assembly of the giant grouper (Epinephelus lanceolatus)

Article Open access 18 December 2024

Background and Summary

Cichlid fishes characterized by rapid adaptive radiation and sympatric speciation¹ serve as excellent model species to understand divergent evolution. Existence of vast number of closely related cichlid species within the confines of a single geographical environment² displaying wide phenotypic variation makes them an ideal model system to understand the genetic basis of vertebrate speciation³. While the genomic resources of African cichlid fishes of the subfamily Pseudocrenilabrinae are being extensively used to understand vertebrate speciation, the resources for Asian cichlid fish of subfamily Etroplinae are scanty.

The Etroplus suratensis (Bloch, 1790) commonly known as pearlspot or green chromide is an edible fish of the subfamily Etroplinae. This substrate spawning cichlid fish (Fig. 1) is characterized by elaborate courtship and multiple parental care⁴. Though brackishwater is principal habitat for this herbivorous fish, it displays great adaptations to salinity by surviving and breeding in freshwater habitats. Only two species of the subfamily Etroplinae, Etroplus canarensis and Paratilapia polleni have draft genome assemblies available in public repository⁵. Both of these assemblies were generated with short DNA reads and have thousands of scaffolds with N50 lengths around 20 Kb. Therefore, at present, we do not have a chromosome-scale reference genome for the subfamily Etroplinae.

In this study, we have used PacBio HiFi technology to generate a highly contiguous genome assembly contigs for pearlspot fish with assembly length of 1.276 Gb and N50 length of 36.16 Mb. Then, the Arima HiC technology was used to order and orient contigs to 24 chromosome-scale scaffolds (Fig. 2). The combination of these two sequencing technologies resulted in the generation of a chromosome-scale genome assembly which is of 1.247 Gb length in 117 scaffolds with N50 length of 51.57 Mb (Table 1). The assembly length is closer to the estimate obtained with flow cytometry method (1.195 Gb) than the estimate obtained with k-mer based analysis (1.103 Gb) using short DNA reads (Fig. 3). A k-mer based analyses estimated the consensus quality value (QV) of 60 for the genome assembly and 98% of assembly length was anchored to 24 chromosomes with telomere ends. Additionally, the haplotype-resolved assemblies generated using a combination of HiFi and HiC reads are of 1.242 Gb and 1.225 Gb with N50 statistic of 51.89 Mb and 51.40 Mb, respectively.

Table 1 Genome assembly statistics.

Full size table

A custom repeat library consisting of 2,112 repeat families obtained through de novo modelling of repeat elements in the assembly was used to identify and classify the repeat elements in the pearlspot genome. The repeat elements accounted for 52.96% (Table 2) of the genome predominated by LINEs (20.2%), DNA transposons (16.71%) and LTR elements (3.85%). A strategy that combines evidence generated using Illumina RNAseq reads, PacBio Iso-Seq reads, ab initio methods and predicted proteins in related-species genomes resulted in the prediction of 27,192 protein-encoding genes (PEGs) in pearlspot genome. (Table 3). Further, 18,089 non-coding RNAs were detected, with abundant presence of tRNA, ribosomal RNA, spliceosomal RNA, microRNA and Small nucleolar RNA (Supplementary Table 1). The high-quality genome resource would help in specific understanding of salinity tolerance and parental care of pearlspot fish and also the evolution of cichlid fish in general.

Table 2 Repeat Profile of Etroplus suratensis.

Full size table

Table 3 Properties of protein-encoding genes.

Full size table

Methods

Specimen for generating sequence data

A single specimen of male pearlspot fish was used to generate the sequence data required for building the genome assembly. The lineage of the specimen was confirmed based on the analysis of the barcoding gene, Cytochrome C Oxidase I (CO I). Briefly, sequence of partial CO I gene of the specimen was generated (MG923355) following amplification with universal primers⁶. The CO I sequence of other accessions under subfamily Etroplinae were sourced from BOLD system v4 database⁷ along with Oreochromis niloticus accession as outgroup (Supplementary Table 2). The sequences aligned with MUSCLE module of MEGA X^8,9 were used to build a Maximum Likelihood tree with HKY + G + I model and 1000 bootstrap iterations in MEGA X⁹. (Supplementary Figure S1).

DNA sequence reads

In this study, three types of DNA sequence data, short reads, long high-fidelity (HiFi) reads and chromatin linked reads (HiC) were generated. The short reads were used to assess the genome properties, the HiFi reads were used for generating genome assembly contigs and HiC reads were used for building assembly scaffolds. Briefly, high molecular weight genomic DNA was isolated from muscle tissue of a single male fish using QIAGEN Genomic-tip 100/G Midi kit (Qiagen, Hilden, Germany). DNA quantity was measured with Qubit 3.0 fluorometer (Thermofisher Scientific, Massachusetts, USA) using DNA HS assay kit (Thermofisher Scientific, Massachusetts, USA) and DNA purity was checked with NanoDrop 2000 (Thermofisher Scientific, Massachusetts, USA). DNA integrity was evaluated on 1% agarose gel and on Femto pulse system (Agilent Technologies, California, USA). DNA shearing was performed on Megaruptor 3 system (Diagenode, Belgium). Three separate sequencing libraries were constructed using the SMRTbell Express template Preparation Kit 2.0 (Pacific Biosciences, California, USA). The libraries were purified using AMPure PB beads (Pacific Biosciences, California, USA) and the purified libraries were treated with SMRTbell Enzyme cleanup kit 2.0 to remove any unbound adapters and damaged DNA. The libraries were size selected using BluePippin (Sage Science, USA) with 0.75% DF Marker S1 High pass Cassette. The size selected libraries were subjected to primer annealing and polymerase binding using Sequel II binding kit 2.2. About 75 to 80 pM of each library was loaded onto individual 8 M SMRT cells (n = 3) and sequenced on PacBio Sequel II system in CCS/HiFi mode to generate polymerase read sequences. Later, the raw polymerase reads were processed with ccs algorithm v6.4.0 (–min-passes = 3;–min-snr = 2.5;–min-rq = 0.99) to generate HiFi reads. The HiFi read recovery from polymerase read bases was 50.7% and 4.87% for HiFi read number and HiFi read bases, respectively (Table 4).

Table 4 Statistics of HiFi reads generated on Pacbio Sequel II using 3 SMRT cells.

Full size table

The same DNA was used to construct a sequencing library with KAPA HyperPlus kit (Basel, Switzerland) as per manufacturers’ protocol. The quality of the library was assessed using Agilent 2100 bioanalyzer (Agilent Technologies, California, USA). The libraries with average insert size of 571 bp were sequenced on Illumina Novaseq6000. These short DNA reads were only used to understand the properties of the genome (Table 5).

Table 5 DNA short read data statistics.

Full size table

The HiC library was constructed using Proximo HiC Kit, animal (Phase genomics, USA) as per the manufacturer’s instructions. About 10 nM of library was sequenced using S4 flow cell on Illumina Novaseq6000 in paired-end mode to generate 150 bp linked reads. The restrictions enzymes used to prepare HiC library from fish sample were DpnII, DdeI, HinfI, and MseI (Table 6).

Table 6 HiC raw data statistics.

Full size table

In total, 9.975 million HiFi reads (98.18 Gb, 78.7 X), 992.5 million short reads (149.87 Gb, 120.2 X, 92.3% Q30 bases) and 4.398 billion HiC reads (659.83 Gb, 529.1 X, 89% Q30 bases) of DNA sequence data has been generated.

RNA sequence reads

The RNA sequence reads were generated using specimens of various development stages (1-, 3-, and 15-day old larvae) and tissues (muscle, skin, kidney, liver, stomach, intestine, gill, brain, spleen, testis, and heart) collected from the same adult male Pearlspot fish. Briefly, the total RNA was isolated by using Trizol (DSS Takara, CA, USA) and purified with Nucleospin RNA cleanup kit (Macherey-Nagel, Germany). RNA quantification was performed with Qubit3.0 fluorometer using RNA HS assay kit (ThermoFisher Scientific, Massachusetts, USA) and on Nanodrop 2000. The quality and integrity of RNA was checked on Agilent 2100 bioanalyzer. The cDNA library was prepared with KAPA HyperPrep kit (Roche, Basel, Switzerland) and sequenced on Illumina Novaseq6000 to generate 2 × 150 bp paired-end reads. The raw reads were trimmed with Trimmomatic v0.39¹⁰ to obtain clean reads with Q30 bases above 90% (Table 7).

Table 7 Statistics of RNAseq data generated for various adult tissues and various developmental stages of Pearlspot.

Full size table

Genome size assessment

An assessment of genome size was made on flow cytometry principle with the blood sample following propidium iodide staining in BD Accuri^TM C6 flow cytometer^11,12. The Chicken erythrocytes from BD^TM DNA QC Particles kit (BD Biosciences, California, USA) was used as control. The histogram data analyzed with BD Accuri^TM C6 Plus software v1.0.23.1 indicated the estimated genome size as 1.22 pg (1.195 Gb) for pearlspot fish (Fig. 3a). The assessment of genome size was also made with DNA sequence reads on k-mer principle. The DNA short reads were subjected to quality trimming with Trimmomatic v0.39¹⁰ to obtain 78.6 Gb (79X) of clean reads with 96.2% Q30 bases. An assessment of genome properties was made with these clean reads using jellyfish v2.3.0¹³ and GenomeScope v2.0¹⁴ based on k-mer count and coverage principle. The 21-mer based histogram indicated that the estimated genome length, repeat content and heterozygosity of Pearlspot genome is 1.103 Gb, 28.3% and 0.228%, respectively (Fig. 3b).

Genome assembly

The HiFi reads were initially screened with NCBI foreign contamination screen¹⁵ to discard contaminants originating from adaptor/vector and foreign organisms. About 78 X coverage of HiFi reads were used to generate assembly contigs with Hifiasm v0.16.1 tool¹⁶. There were 375 contigs with a total length of 1.276 Gb and N50 length of 36.16 Mb. Then the haplotigs and the overlaps in the primary assembly were removed with purge-dups¹⁷. Thereafter, about 3.2 billion HiC reads (473.99 Gb, 380 X, 94% Q30 bases) obtained after quality trimming of raw reads with fastp v0.12.4¹⁸ were used for ordering and orienting the assembly contigs to final scaffolds with YaHS v1.1¹⁹. The final assembly consisted of 117 scaffolds with a total length of 1.247 Gb and N50 length of 51.57 Mb. The assembly was assessed for its completeness by benchmarking with single-copy orthologs of actinopterygii_odb10 (2021-02-19) using BUSCO v5.7.0²⁰. Of the 3,640 BUSCO orthologs, 3,584 were complete and single-copy genes, 29 were complete and duplicated genes, 21 were fragmented genes and 6 were missing genes which indicated that the genome assembly has 99.9% completeness with 0.1% missing genomic regions. The phased assemblies obtained by following the similar methodology were assessed to be 99.2% and 98.7% complete based on BUSCO scores.

Repeat prediction

The RepeatModeler v 2.0.5 (http://repeatmasker.org/RepeatModeler/) enabling LTR structural analysis was used with rmblast v2.14.1 search engine to model and find de novo repeat elements in the pearlspot genome. The analysis identified 1,924 RepeatScout/RECON families and 281 LTRPipeline families. After removing redundant LTR families, a custom repeat library with 2,112 repeat families was established. Then, RepeatMasker v 4.1.5²¹ with rmblast v2.14.1 search engine was used with custom repeat library to identify and classify the repeat elements in pearlspot genome assembly. The repeat elements accounted for 52.96% (Table 2) of the genome predominated by LINEs (20.2%), DNA transposons (16.71%) and LTR elements (3.85%).

Genome annotation

A strategy described earlier^22,23 that combines evidence generated using Illumina RNAseq reads (generated in this study), PacBio Iso-Seq reads (GenBank accessions, SRR28827909-916), ab initio methods and predicted proteins from related-species genomes (Supplementary Table 3) has been used to predict protein-encoding genes (PEGs). Overall, five different evidences were used which were, (1) ab initio predictions obtained with AUGUSTUS v3.4.0²⁴; (2) predictions with AUGUSTUS v3.4.0²⁴ based on hints generated with Iso-Seq reads using GMAP v2017.11.15-4²⁵; (3) predictions obtained based on predicted proteins from genomes of related species using BRAKER v2.0.4²⁶ and GenomeThreader 1.7.3²⁷; (4) Iso-Seq reads derived transcript evidence obtained with GMAP v2017.11.15-4²⁵; and (5) RNAseq reads derived transcript evidence obtained with Hisat v2.2.1-4²⁸, Stringtie v2.2.1²⁹ and TransDecoder v5.7.0. All the five evidences were combined using EVidenceModeler v2.0.0³⁰ to arrive at the consensus prediction of PEGs. The Pearlspot genome assembly was predicted to contain 27,192 PEGs with mean exon number of 9 per gene (Table 3). Annotation and pathway analysis^31,32 of PEGs (Table 8 and Supplementary Figures S2 to S5) were performed by combining results from blastx tool against Actinopterygii (txid7898) dataset of non-redundant database from NCBI, and InterProScan³³ and EggNOG³⁴ mapper module of OmicsBox Tool v3.0.25³⁵.

Table 8 Annotation statistics of predicted protein-encoding genes.

Full size table

The identification of noncoding genes in the Pearlspot genome involved aligning the repeat-masked assembly with the Rfam database [http://rfam.xfam.org/], using cmscan from infernal v1.1.2³⁶. A total of 18,089 non-coding RNAs were detected, with abundant presence of tRNA, ribosomal RNA, spliceosomal RNA, microRNA and Small nucleolar RNA (Supplementary Table 1).

Data Records

The raw datasets were deposited under Sequence Read Archive (SRA) at NCBI with the accession numbers, SRR27970333, SRR27999027-029, SRR28233220, SRR28003587-595, SRR28003597-599, SRR28003601-602. The genome was submitted under Genome category at NCBI with the Genome assembly accession number, GCA_041004005.1³⁷. All the raw datasets were linked to the Bioproject, PRJNA1076662³⁸ and SRA study SRP489803³⁹. The genome annotations were submitted to the Figshare repository with the https://doi.org/10.6084/m9.figshare.26303968.v3³².

Technical Validation

The full-length mitochondrial DNA genome (Fig. 4) of 16,467 bp was obtained as a single scaffold in the final assembly suggesting the sufficiency of 78 X coverage of HiFi reads and 380 X coverage of HiC reads. The assembly generated for pearlspot fish is highly contiguous as indicated by contig N50 of 36.16 Mb. The assembly was assessed to be containing 99.3% of complete and 0.6% of the fragmented genes when benchmarked with actinopterygii_odb10 (2021-02-19) lineage using BUSCO v5.7.0²⁰ (Fig. 5). About 98.03% of the assembly length is represented in the longest 24 scaffolds indicating chromosome-scale nature of the assembly. The consensus quality value and error rate of the genome assembly were assessed to be 60.0762 and 9.82596e-07, respectively when validated with k-mer (31-mer) based procedure executed in Merqury v1.3⁴⁰ indicating high base accuracy of the assembly. The good alignment statistics (Table 9) obtained by aligning RNAseq reads and DNA short reads on to the genome further validated the accuracy of the assembly. The chromosome-scale scaffolds were searched for the presence of telemore repeat sequences using tidk v0.2.0⁴¹ with Cypriniformes clade (AACCCT). All the scaffolds were observed to be having telomere ends (Fig. 2b,Track 3). The genome assembly has shown good synteny (Fig. 6) with other closely related cichlid fish genomes. About 21,580 (79.36%) of the protein-encoding genes could be annotated functionally (Table 8).

Table 9 Read alignment statistics of various short reads generated in the study against the assembled genome of E. suratensis.

Full size table

Code availability

All data processing programs were executed with default parameters unless otherwise specified in the Methods section. There were no custom scripts or code utilized in this study.

References

Schliewen, U. K., Tautz, D. & Pääbo, S. Sympatric speciation suggested by monophyly of crater lake cichlids. Nature 368, 629–632 (1994).
Article ADS CAS PubMed Google Scholar
Ronco, F. et al. Drivers and dynamics of a massive adaptive radiation in cichlid fishes. Nature 589, 76–81 (2021).
Article ADS CAS PubMed Google Scholar
Kocher, T. D. Adaptive evolution and explosive speciation: the cichlid fish model. Nat Rev Genet 5, 288–298 (2004).
Article CAS PubMed Google Scholar
Ward, J. A. & Wyman, R. L. Ethology and ecology of cichlid fishes of the genus Etroplus in Sri Lanka: preliminary findings. Environ Biol Fishes 2, 137–145 (1977).
Article Google Scholar
Matschiner, M., Böhne, A., Ronco, F. & Salzburger, W. The genomic timeline of cichlid fish diversification across continents. Nat Commun 11 (2020).
Ward, R. D., Zemlak, T. S., Innes, B. H., Last, P. R. & Hebert, P. D. N. DNA barcoding Australia’s fish species. Philosophical Transactions of the Royal Society B: Biological Sciences 360, 1847–1857 (2005).
Article CAS Google Scholar
Ratnasingham, S. & Hebert, P. D. N. BOLD: The Barcode of Life Data System: Barcoding. Mol Ecol Notes 7, 355–364 (2007).
Article CAS PubMed PubMed Central Google Scholar
Edgar, R. C. MUSCLE: Multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32, 1792–1797 (2004).
Article CAS PubMed PubMed Central Google Scholar
Kumar, S., Stecher, G., Li, M., Knyaz, C. & Tamura, K. MEGA X: Molecular evolutionary genetics analysis across computing platforms. Mol Biol Evol 35, 1547–1549 (2018).
Article CAS PubMed PubMed Central Google Scholar
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
Article CAS PubMed PubMed Central Google Scholar
Swathi, A., Shekhar, M. S., Katneni, V. K. & Vijayan, K. K. Genome size estimation of brackishwater fishes and penaeid shrimps by flow cytometry. Mol Biol Rep 45, 951–960 (2018).
Article CAS PubMed Google Scholar
Raymond, J. A. J. et al. Comparative genome size estimation of different life stages of grey mullet, Mugil cephalus Linnaeus, 1758 by flow cytometry. Aquac Res 53, 1151–1158 (2022).
Article Google Scholar
Marçais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27, 764–770 (2011).
Article PubMed PubMed Central Google Scholar
Ranallo-Benavidez, T. R., Jaron, K. S. & Schatz, M. C. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat Commun 11 (2020).
Astashyn, A. et al. Rapid and sensitive detection of genome contamination at scale with FCS-GX. Genome Biol 25 (2024).
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat Methods 18, 170–175 (2021).
Article CAS PubMed PubMed Central Google Scholar
Guan, D. et al. Identifying and removing haplotypic duplication in primary genome assemblies. Bioinformatics 36, 2896–2898 (2020).
Article CAS PubMed PubMed Central Google Scholar
Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890 (2018).
Article PubMed PubMed Central Google Scholar
Zhou, C., McCarthy, S. A. & Durbin, R. YaHS: yet another Hi-C scaffolding tool. Bioinformatics 39 (2023).
Manni, M., Berkeley, M. R., Seppey, M., Simão, F. A. & Zdobnov, E. M. BUSCO update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Mol Biol Evol 38, 4647–4654 (2021).
Article CAS PubMed PubMed Central Google Scholar
Smit, A. F. A., Hubley, R. & Green, P. RepeatMasker Open-4.0. 2013–2016.
Katneni, V. K. et al. A Superior Contiguous Whole Genome Assembly for Shrimp (Penaeus indicus). Front Mar Sci 8 (2022).
Shekhar, M. S. et al. First Report of Chromosome-Level Genome Assembly for Flathead Grey Mullet, Mugil cephalus (Linnaeus, 1758). Front Genet 13, 911446 (2022).
Article PubMed PubMed Central Google Scholar
Hoff, K. J. & Stanke, M. Predicting Genes in Single Genomes with AUGUSTUS. Curr Protoc Bioinformatics 65, 1–54 (2019).
Article Google Scholar
Wu, T. D. & Watanabe, C. K. GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics 21, 1859–1875 (2005).
Article CAS PubMed Google Scholar
Brůna, T., Hoff, K. J., Lomsadze, A., Stanke, M. & Borodovsky, M. BRAKER2: Automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database. NAR Genom Bioinform 3, 1–11 (2021).
Google Scholar
Gremme, G. Computational gene structure prediction. Staats-und Universitätsbibliothek Hamburg Carl von Ossietzky, (2012).
Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat Biotechnol 37, 907–915 (2019).
Article CAS PubMed PubMed Central Google Scholar
Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol 33, 290–295 (2015).
Article CAS PubMed PubMed Central Google Scholar
Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol 9, 1–22 (2008).
Article Google Scholar
Kanehisa, M., Furumichi, M., Tanabe, M., Sato, Y. & Morishima, K. KEGG: New perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res 45, D353–D361 (2017).
Article CAS PubMed Google Scholar
Vinaya Kumar Katneni et al. Etroplus suratensis genome and annotation. Figshare at https://doi.org/10.6084/m9.figshare.26303968.v3 (2024).
Jones, P. et al. InterProScan 5: genome-scale protein function classification. Bioinformatics 30, 1236–1240 (2014).
Article CAS PubMed PubMed Central Google Scholar
Huerta-Cepas, J. et al. Fast genome-wide functional annotation through orthology assignment by eggNOG-mapper. Mol Biol Evol 34, 2115–2122 (2017).
Article CAS PubMed PubMed Central Google Scholar
Omicsbox. OmicsBox-Bioinformatics made easy (Version 3.0.25) (2019).
Nawrocki, E. P. & Eddy, S. R. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics 29, 2933–2935 (2013).
Article CAS PubMed PubMed Central Google Scholar
NCBI Genome Assembly Database http://identifiers.org/assembly:GCA_041004005.1 (2024).
NCBI BioProject http://identifiers.org/bioproject:PRJNA1076662 (2024).
NCBI Sequence Read Archive http://identifiers.org/insdc.sra:SRP489803 (2024).
Rhie, A., Walenz, B. P., Koren, S. & Phillippy, A. M. Merqury: Reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol 21 (2020).
Brown, M., la Rosa, P. M. & Mark, B. A Telomere Identification Toolkit. Zenodo https://doi.org/10.5281/zenodo.10091385 (2023).

Download references

Acknowledgements

This work was carried out in the project entitled ‘Unravelling signatures of growth and salinity adaptation in Etroplus suratensis through omics approaches’ funded by Department of Biotechnology, Government of India (BT/PR34518/AAQ/3/965/2019). The authors are thankful to Director, ICAR-CIBA for providing necessary support in executing this research work. We acknowledge Nucleome Informatics Private Limited for help in generating the sequence data.

Author information

Authors and Affiliations

Centre for Bioinformatics, Nutrition Genetics and Biotechnology Division, ICAR - Central Institute of Brackishwater Aquaculture, No 75, Santhome High Road, MRC Nagar, Chennai, 600 028, Tamil Nadu, India
Vinaya Kumar Katneni, Karthic Krishnan, Sudheesh K Prabhudas, Roja Jayaraman, Nida Quraishi, Ashok Kumar Jangam & Nimisha Kaikkolante
Nutrition Genetics and Biotechnology Division, ICAR - Central Institute of Brackishwater Aquaculture, No 75, Santhome High Road, MRC Nagar, Chennai, 600 028, Tamil Nadu, India
Kumaraguru Vasagam & Kumaravel Jayaraman
Crustacean Culture Division, ICAR-Central Institute of Brackishwater Aquaculture, No 75, Santhome High Road, MRC Nagar, Chennai, 600028, Tamil Nadu, India
Jesudhas Raymond Jani Angel
Aquatic Animal Health and Environment Division, ICAR-Central Institute of Brackishwater Aquaculture, No 75, Santhome High Road, MRC Nagar, Chennai, 600028, Tamil Nadu, India
S Shekhar Mudagandur

Authors

Vinaya Kumar Katneni
View author publications
You can also search for this author inPubMed Google Scholar
Karthic Krishnan
View author publications
You can also search for this author inPubMed Google Scholar
Sudheesh K Prabhudas
View author publications
You can also search for this author inPubMed Google Scholar
Roja Jayaraman
View author publications
You can also search for this author inPubMed Google Scholar
Nida Quraishi
View author publications
You can also search for this author inPubMed Google Scholar
Kumaraguru Vasagam
View author publications
You can also search for this author inPubMed Google Scholar
Ashok Kumar Jangam
View author publications
You can also search for this author inPubMed Google Scholar
Jesudhas Raymond Jani Angel
View author publications
You can also search for this author inPubMed Google Scholar
Nimisha Kaikkolante
View author publications
You can also search for this author inPubMed Google Scholar
Kumaravel Jayaraman
View author publications
You can also search for this author inPubMed Google Scholar
S Shekhar Mudagandur
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

V.K.K. and M.S.S. conceived the study. V.K.K., K.K., S.K.P., R.J., A.K.J., N.K. and K.J. performed analysis. N.Q., K.V. and J.R.J.A. prepared the material. V.K.K., A.K.J., K.K. and S.K.P. drafted the manuscript. All authors contributed to final manuscript editing.

Corresponding author

Correspondence to Vinaya Kumar Katneni.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary File 1

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Katneni, V.K., Krishnan, K., Prabhudas, S.K. et al. Genome assembly at chromosome scale with telomere ends for Pearlspot, Etroplus suratensis. Sci Data 11, 1226 (2024). https://doi.org/10.1038/s41597-024-04096-0

Download citation

Received: 22 July 2024
Accepted: 06 November 2024
Published: 13 November 2024
DOI: https://doi.org/10.1038/s41597-024-04096-0