Analysis Name | Solanum tuberosum 'DM 1-3 516 R44 (cultivar)' DM8.1 Assembly & Annotation |
Sequencing technology | ONT ultra-long reads, Hi-C reads and HiFi reads |
Assembly method | Nextdenovo |
Release Date | 2023-02-06 |
Yang X, Zhang L, Guo X, Xu J, Zhang K, Yang Y, Yang Y, Jian Y, Dong D, Huang S, Cheng F, Li G. The gap-free potato genome assembly reveals large tandem gene clusters of agronomical importance in highly repeated genomic regions. Mol Plant. 2023 Feb 6;16(2):314-317. doi: 10.1016/j.molp.2022.12.010.
AbstractPotato is a vital food security crop and is ranked as the world’s third most important food crop after rice and wheat. In 2011, the first genome assembly of a doubled monoploid potato DM1-3 516 R44 (DM) was released (Potato Genome Sequencing Consortium, 2011), which has been widely used as one of the most popular reference genomes in the last decade and served as a valuable resource in plant genomics and potato genetics community (Leisner et al., 2018; Yang et al., 2020; Zheng et al., 2020). The latest version of DM genome assembly (v6.1) (Pham et al., 2020) served as a good reference and quality control in studies of diploid and tetraploid potatoes (Zhou et al., 2020; Bao et al., 2022; Hoopes et al., 2022; Sun et al., 2022; Tang et al., 2022). However, 161 gaps remain in DM6.1 (v6.1), and the centromere and telomere structures are incomplete. Considering the importance of the DM genome in potato genomics, genetics, and breeding studies, generating a complete genome assembly of DM is of great importance.
In this study, a telomere-to-telomere gap-free genome of DM (DM8.1) (Figure 1A) was assembled through combining Oxford Nanopore Technologies (ONT) ultra-long reads sequencing (119.81× coverage) and Hi-C sequencing (130.57×) (Supplemental Table 1), as well as being assisted by multiple gap-closing strategies coupled with high fidelity (HIFI) reads from circular consensus sequencing.
A total of 179 contigs with a summed size of 773.36 Mb and a contig N50 of 59.72 Mb were obtained after initial genome assembly, polishing, and decontamination. Hi-C reads further anchored 37 of the 179 contigs into 12 chromosomes (Supplemental Figure 1; Supplemental Table 2), accounting for 95.53% (738.82 Mb) of the total assembly, and we named it preDM8. For the 142 (34.53 Mb) unanchored contigs, over 98% are short sequences (< 1 Mb), and all could be aligned to chromosomes with high similarity, indicating that these were repetitive or redundant sequences. The preDM8 has better contiguous sequences than DM6.1 and the potato pan-genome assemblies (Tang et al., 2022) (Supplemental Figure 2). However, there were 25 gaps in preDM8. Three methods were further adopted to close these gaps (Supplemental Figure 3A; Supplemental Table 3). First, we aligned the ONT reads to preDM8, and reads mapped on the flanking regions of gaps were collected and assembled, which successfully closed 14 gaps. Second, based on the syntenic homologous fragments between preDM8 and DM6.1, three gaps were closed with the DM6.1 consecutive sequences that covered these gaps in preDM8. Third, target sequences amplification experiments (Supplemental Figure 3B) and HIFI sequencing were performed, which successfully closed the remaining eight gaps (Supplemental Figures 3C and 4). Finally, we generated the gap-free genome assembly of DM and named it DM8.1 (Figure 1A; Supplemental Table 4).
Assembly statistics
Assembly Version | DM8.1 |
Genome size | 773,468,225 bp |
Number of chromosomes | 12 |
Number of scaffolds | 154 |
Scaffold N50 | 60,415,087 bp |
Scaffold L50 | 5 |
Assembly level | Chromosome |
The Solanum tuberosum 'DM 1-3 516 R44 (cultivar)' DM8.1 Assembly file is available in FASTA format.
Downloads
Chromosomes (FASTA file) | DM8.1_genome.fasta.gz |
The Solanum tuberosum 'DM 1-3 516 R44 (cultivar)' DM8.1 genome gene prediction files are available in GFF3 and FASTA format.
Downloads
Genes (GFF3 file) | DM8.1_gene.gff3.gz |
CDS sequences (FASTA file) | DM8.1_gene.cds.fasta.gz |
Protein sequences (FASTA file) | DM8.1_gene.pep.fasta.gz |
Functional annotation for the Solanum tuberosum 'DM 1-3 516 R44 (cultivar)' DM8.1 is available for download below. The proteins were analyzed using InterProScan to assign InterPro domains(Pfam).
Downloads
Domain from InterProScan | - |
Summary
Query | Chr | Size(bp) | Coordinates | BLASTn Hit | BLASTn %ID | Domain |
SLF15 | chr01 | 88922896 | 3684553-3683294 | SL2.31ch01:2198500-2196501_SLF15 | 93.889 | F-box domain |
SLF16 | chr01 | 88922896 | 5240318-5239104 | SL2.31ch01:2723400-2721301_SLF16 | 95.093 | F-box domain |
SLF22 | chr01 | 88922896 | 36065232-36064093 | KU960924.1, SLF22 | 95.351 | F-box domain |
SLF17 | chr01 | 88922896 | 37274506-37273304 | KU960921.1, SLF17 | 92.555 | F-box domain |
S-RNase | chr01 | 88922896 | 38418500-38418736,38418824-38419237 | XM_006347185.1, ribonuclease S-F11-like | 100 | Ribonuclease T2 family |
SLF22-2 | chr01 | 88922896 | 38739674-38740825 | KU960924.1, SLF22 | 79.638 | F-box domain |
SLF6 | chr01 | 88922896 | 38940643-38939498 | KU987626.1, SLF6 | 83.014 | F-box domain |
SLF5 | chr01 | 88922896 | 41055267-41054083 | KU987627.2, SLF5 | 94.024 | F-box domain |
SLF12 | chr01 | 88922896 | 41124574-41125740 | SL2.31ch01:45516501-45518600_SLF12Ψ | 88.974 | F-box domain |
SLF5-2 | chr01 | 88922896 | 41425512-41424343 | KJ814884.1, SLF5 | 95.299 | F-box domain |
SLF7 | chr01 | 88922896 | 41498325-41497156 | KJ814851.1, SLF7 | 93.095 | F-box domain |
SLF20 | chr01 | 88922896 | 41553335-41552169 | KU960922.1, SLF20 | 95.63 | F-box domain |
SLF21 | chr01 | 88922896 | 42095600-42094380 | KU960923.1, SLF21 | 92.5 | F-box domain |
SLF9 | chr01 | 88922896 | 42962873-42961731 | KU987631.1, SLF9 | 92.826 | F-box domain |
SLF11Ψ | chr01 | 88922896 | 44561272-44562433 | NM_001323461.1, SLF11 | 93.782 | - |
SLF6-2 | chr01 | 88922896 | 45410759-45409617 | KU987626.1, SLF6 | 89.323 | F-box domain |
SLF13 | chr01 | 88922896 | 45915125-45913923 | KJ814856.1, SLF13 | 94.015 | F-box domain |
SLF18 | chr01 | 88922896 | 60028682-60029800 | SL2.31ch01:67739501-67741500_SLF18 | 94.459 | F-box domain |
SLF18-2 | chr01 | 88922896 | 60034523-60035641 | SL2.31ch01:67739501-67741500_SLF18 | 94.37 | F-box domain |
SLF19 | chr01 | 88922896 | 60065236-60064124 | SL2.31ch01:67757501-67759600_SLF19 | 93.057 | F-box domain |
Nucleotide
Protein