Solanum lycopersicum 'Heinz 1706 (cultivar)' SL5.0 Assembly & Annotation

Overview

Analysis Name Solanum lycopersicum 'Heinz 1706 (cultivar)' SL5.0 Assembly & Annotation
Sequencing technology HiFi reads and Hi-C reads
Assembly method Flye v.2.7, Hicanu v.2.0 and Hifiasm v.0.13
Release Date 2022-06-08
Reference Publication(s)

Zhou, Y., Zhang, Z., Bao, Z., Li, H., Lyu, Y., Zan, Y., … Huang, S. (2022). Graph pangenome captures missing heritability and empowers tomato breeding. Nature, 606(7914), 527–534. https://www.nature.com/articles/s41586-022-04808-9.

Abstract

Missing heritability in genome-wide association studies defines a major problem in genetic analyses of complex biological traits. The solution to this problem is to identify all causal genetic variants and to measure their individual contributions. Here we report a graph pangenome of tomato constructed by precisely cataloguing more than 19 million variants from 838 genomes, including 32 new reference-level genome assemblies. This graph pangenome was used for genome-wide association study analyses and heritability estimation of 20,323 gene-expression and metabolite traits. The average estimated trait heritability is 0.41 compared with 0.33 when using the single linear reference genome. This 24% increase in estimated heritability is largely due to resolving incomplete linkage disequilibrium through the inclusion of additional causal structural variants identified using the graph pangenome. Moreover, by resolving allelic and locus heterogeneity, structural variants improve the power to identify genetic factors underlying agronomically important traits leading to, for example, the identification of two new genes potentially contributing to soluble solid content. The newly identified structural variants will facilitate genetic improvement of tomato through both marker-assisted selection and genomic selection. Our study advances the understanding of the heritability of complex traits and demonstrates the power of the graph pangenome in crop breeding.

Assembly statistics

Assembly Source AGI_CAAS_Shenzhen
Assembly Version SL5.0
Annotation Source AGI_CAAS_Shenzhen
Annotation Version ITAG5.0
Total Scaffold Length (bp) 801,812,098
Number of Scaffolds 13
Min. Number of Scaffolds containing half of assembly (L50) 6
Shortest Scaffold from L50 set (N50) 67,567,563
Total Contig Length (bp) 801,782,098
Number of Contigs 73
Min. Number of Contigs containing half of assembly (L50) 9
Shortest Contig from L50 set (N50) 41,697,488
Number of Protein-coding Transcripts 43,752
Number of Protein-coding Genes 36,648
Percentage of Eukaryote BUSCO Genes 94.7
Percentage of Embroyphyte BUSCO Genes 93.5
Assembly level Chromosome

Assembly

The Solanum lycopersicum 'Heinz 1706 (cultivar)' SL5.0 Assembly file is available in FASTA format.

Downloads

Chromosomes (FASTA file) Solanum_lycopersicum_Heinz1706_SL5.0.fasta.gz

Gene Predictions

The Solanum lycopersicum 'Heinz 1706 (cultivar)' SL5.0 genome gene prediction files are available in GFF3 and FASTA format.

Downloads

Genes (GFF3 file) Slycopersicum_796_ITAG5.0.gene.gff3.gz
CDS sequences (FASTA file) Slycopersicum_796_ITAG5.0.cds.fa.gz
Protein sequences (FASTA file) Slycopersicum_796_ITAG5.0.protein.fa.gz

Functional Analysis

Functional annotation for the Solanum lycopersicum 'Heinz 1706 (cultivar)' SL5.0 is available for download below. The proteins were analyzed using InterProScan to assign InterPro domains(Pfam).

Downloads

Domain from InterProScan Solanum_lycopersicum_Heinz1706_SL5.0.Pfam.tsv.gz

S genes

Summary

QueryChrSize(bp)CoordinatesBLASTn HitBLASTn %IDDomain
SLF151933643822216983-2215724SL2.31ch01:2198500-2196501_SLF15100F-box domain
SLF161933643822738960-2737779SL2.31ch01:2723400-2721301_SLF16100F-box domain
SLF17Ψ19336438243356346-43355261SL2.31ch01:40853100-40851001_SLF17Ψ100-
SLF119336438246368379-46369548NM_001301439.2, SLF1100F-box domain
S-RNase19336438247178015-47177776,
47177678-47177253
XM_004229015.1,
Ribonuclease S-3
100Ribonuclease T2 family
SLF2Ψ19336438248064926-48063745KJ814870.1, SLF2100-
SLF12Ψ19336438248121492-48122623SL2.31ch01:45516501-45518600_SLF12Ψ100-
SLF4Ψ19336438248188795-48187629KJ814943.1, SLF4100-
SLF5Ψ19336438248269494-48268326KJ814872.1, SLF5100-
SLF6Ψ19336438248287030-48285885KJ814944.1, SLF6100-
SLF8Ψ19336438248844367-48843199SL2.31ch01:46243000-46240701_SLF8Ψ100-
SLF7Ψ19336438248869148-48868051SL2.31ch01:46267800-46265701_SLF7Ψ100-
SLF919336438251037284-51036220NM_001329461.2, SLF9100F-box domain
SLF10Ψ19336438251482478-51483709KJ814899.1, SLF10100-
SLF1119336438253433949-53435121KJ814877.1, SLF11100F-box associated
SLF1219336438255193039-55191876NM_001301441.1, SLF12100F-box associated
SLF1319336438256038566-56037373NM_001301435.1, SLF13100F-box associated
SLF14Ψ19336438259249426-59248256KJ814903.1, SLF14100-
SLF1819336438270407939-70409054SL2.31ch01:67739501-67741500_SLF18100F-box domain
SLF1919336438270426963-70425854SL2.31ch01:67757501-67759600_SLF19100F-box domain

Nucleotide

Protein

© 2023 National Genomics Data Center, China National Center for Bioinformation / Beijing Institute of Genomics, Chinese Academy of Sciences