Solanum tuberosum 'DM 1-3 516 R44 (cultivar)' DM8.1 Assembly & Annotation

Overview

Analysis Name Solanum tuberosum 'DM 1-3 516 R44 (cultivar)' DM8.1 Assembly & Annotation
Sequencing technology ONT ultra-long reads, Hi-C reads and HiFi reads
Assembly method Nextdenovo
Release Date 2023-02-06
Reference Publication(s)

Yang X, Zhang L, Guo X, Xu J, Zhang K, Yang Y, Yang Y, Jian Y, Dong D, Huang S, Cheng F, Li G. The gap-free potato genome assembly reveals large tandem gene clusters of agronomical importance in highly repeated genomic regions. Mol Plant. 2023 Feb 6;16(2):314-317. doi: 10.1016/j.molp.2022.12.010.

Abstract

Potato is a vital food security crop and is ranked as the world’s third most important food crop after rice and wheat. In 2011, the first genome assembly of a doubled monoploid potato DM1-3 516 R44 (DM) was released (Potato Genome Sequencing Consortium, 2011), which has been widely used as one of the most popular reference genomes in the last decade and served as a valuable resource in plant genomics and potato genetics community (Leisner et al., 2018; Yang et al., 2020; Zheng et al., 2020). The latest version of DM genome assembly (v6.1) (Pham et al., 2020) served as a good reference and quality control in studies of diploid and tetraploid potatoes (Zhou et al., 2020; Bao et al., 2022; Hoopes et al., 2022; Sun et al., 2022; Tang et al., 2022). However, 161 gaps remain in DM6.1 (v6.1), and the centromere and telomere structures are incomplete. Considering the importance of the DM genome in potato genomics, genetics, and breeding studies, generating a complete genome assembly of DM is of great importance.

In this study, a telomere-to-telomere gap-free genome of DM (DM8.1) (Figure 1A) was assembled through combining Oxford Nanopore Technologies (ONT) ultra-long reads sequencing (119.81× coverage) and Hi-C sequencing (130.57×) (Supplemental Table 1), as well as being assisted by multiple gap-closing strategies coupled with high fidelity (HIFI) reads from circular consensus sequencing.

A total of 179 contigs with a summed size of 773.36 Mb and a contig N50 of 59.72 Mb were obtained after initial genome assembly, polishing, and decontamination. Hi-C reads further anchored 37 of the 179 contigs into 12 chromosomes (Supplemental Figure 1; Supplemental Table 2), accounting for 95.53% (738.82 Mb) of the total assembly, and we named it preDM8. For the 142 (34.53 Mb) unanchored contigs, over 98% are short sequences (< 1 Mb), and all could be aligned to chromosomes with high similarity, indicating that these were repetitive or redundant sequences. The preDM8 has better contiguous sequences than DM6.1 and the potato pan-genome assemblies (Tang et al., 2022) (Supplemental Figure 2). However, there were 25 gaps in preDM8. Three methods were further adopted to close these gaps (Supplemental Figure 3A; Supplemental Table 3). First, we aligned the ONT reads to preDM8, and reads mapped on the flanking regions of gaps were collected and assembled, which successfully closed 14 gaps. Second, based on the syntenic homologous fragments between preDM8 and DM6.1, three gaps were closed with the DM6.1 consecutive sequences that covered these gaps in preDM8. Third, target sequences amplification experiments (Supplemental Figure 3B) and HIFI sequencing were performed, which successfully closed the remaining eight gaps (Supplemental Figures 3C and 4). Finally, we generated the gap-free genome assembly of DM and named it DM8.1 (Figure 1A; Supplemental Table 4).

Assembly statistics

Assembly Version DM8.1
Genome size 773,468,225 bp
Number of chromosomes 12
Number of scaffolds 154
Scaffold N50 60,415,087 bp
Scaffold L50 5
Assembly level Chromosome

Assembly

The Solanum tuberosum 'DM 1-3 516 R44 (cultivar)' DM8.1 Assembly file is available in FASTA format.

Downloads

Chromosomes (FASTA file) DM8.1_genome.fasta.gz

Gene Predictions

The Solanum tuberosum 'DM 1-3 516 R44 (cultivar)' DM8.1 genome gene prediction files are available in GFF3 and FASTA format.

Downloads

Genes (GFF3 file) DM8.1_gene.gff3.gz
CDS sequences (FASTA file) DM8.1_gene.cds.fasta.gz
Protein sequences (FASTA file) DM8.1_gene.pep.fasta.gz

Functional Analysis

Functional annotation for the Solanum tuberosum 'DM 1-3 516 R44 (cultivar)' DM8.1 is available for download below. The proteins were analyzed using InterProScan to assign InterPro domains(Pfam).

Downloads

Domain from InterProScan -

S genes

Summary

QueryChrSize(bp)CoordinatesBLASTn HitBLASTn %IDDomain
SLF15chr01889228963684553-3683294SL2.31ch01:2198500-2196501_SLF1593.889F-box domain
SLF16chr01889228965240318-5239104SL2.31ch01:2723400-2721301_SLF1695.093F-box domain
SLF22chr018892289636065232-36064093KU960924.1, SLF2295.351F-box domain
SLF17chr018892289637274506-37273304KU960921.1, SLF1792.555F-box domain
S-RNasechr018892289638418500-38418736,
38418824-38419237
XM_006347185.1,
ribonuclease S-F11-like
100Ribonuclease T2 family
SLF22-2chr018892289638739674-38740825KU960924.1, SLF2279.638F-box domain
SLF6chr018892289638940643-38939498KU987626.1, SLF683.014F-box domain
SLF5chr018892289641055267-41054083KU987627.2, SLF594.024F-box domain
SLF12chr018892289641124574-41125740SL2.31ch01:45516501-45518600_SLF12Ψ88.974F-box domain
SLF5-2chr018892289641425512-41424343KJ814884.1, SLF595.299F-box domain
SLF7chr018892289641498325-41497156KJ814851.1, SLF793.095F-box domain
SLF20chr018892289641553335-41552169KU960922.1, SLF2095.63F-box domain
SLF21chr018892289642095600-42094380KU960923.1, SLF2192.5F-box domain
SLF9chr018892289642962873-42961731KU987631.1, SLF992.826F-box domain
SLF11Ψchr018892289644561272-44562433NM_001323461.1, SLF1193.782-
SLF6-2chr018892289645410759-45409617KU987626.1, SLF689.323F-box domain
SLF13chr018892289645915125-45913923KJ814856.1, SLF1394.015F-box domain
SLF18chr018892289660028682-60029800SL2.31ch01:67739501-67741500_SLF1894.459F-box domain
SLF18-2chr018892289660034523-60035641SL2.31ch01:67739501-67741500_SLF1894.37F-box domain
SLF19chr018892289660065236-60064124SL2.31ch01:67757501-67759600_SLF1993.057F-box domain

Nucleotide

Protein

© 2023 National Genomics Data Center, China National Center for Bioinformation / Beijing Institute of Genomics, Chinese Academy of Sciences