Analysis Name | Solanum lycopersicum 'Heinz 1706 (cultivar)' SL5.0 Assembly & Annotation |
Sequencing technology | HiFi reads and Hi-C reads |
Assembly method | Flye v.2.7, Hicanu v.2.0 and Hifiasm v.0.13 |
Release Date | 2022-06-08 |
Zhou, Y., Zhang, Z., Bao, Z., Li, H., Lyu, Y., Zan, Y., … Huang, S. (2022). Graph pangenome captures missing heritability and empowers tomato breeding. Nature, 606(7914), 527–534. https://www.nature.com/articles/s41586-022-04808-9.
AbstractMissing heritability in genome-wide association studies defines a major problem in genetic analyses of complex biological traits. The solution to this problem is to identify all causal genetic variants and to measure their individual contributions. Here we report a graph pangenome of tomato constructed by precisely cataloguing more than 19 million variants from 838 genomes, including 32 new reference-level genome assemblies. This graph pangenome was used for genome-wide association study analyses and heritability estimation of 20,323 gene-expression and metabolite traits. The average estimated trait heritability is 0.41 compared with 0.33 when using the single linear reference genome. This 24% increase in estimated heritability is largely due to resolving incomplete linkage disequilibrium through the inclusion of additional causal structural variants identified using the graph pangenome. Moreover, by resolving allelic and locus heterogeneity, structural variants improve the power to identify genetic factors underlying agronomically important traits leading to, for example, the identification of two new genes potentially contributing to soluble solid content. The newly identified structural variants will facilitate genetic improvement of tomato through both marker-assisted selection and genomic selection. Our study advances the understanding of the heritability of complex traits and demonstrates the power of the graph pangenome in crop breeding.
Assembly statistics
Assembly Source | AGI_CAAS_Shenzhen |
Assembly Version | SL5.0 |
Annotation Source | AGI_CAAS_Shenzhen |
Annotation Version | ITAG5.0 |
Total Scaffold Length (bp) | 801,812,098 |
Number of Scaffolds | 13 |
Min. Number of Scaffolds containing half of assembly (L50) | 6 |
Shortest Scaffold from L50 set (N50) | 67,567,563 |
Total Contig Length (bp) | 801,782,098 |
Number of Contigs | 73 |
Min. Number of Contigs containing half of assembly (L50) | 9 |
Shortest Contig from L50 set (N50) | 41,697,488 |
Number of Protein-coding Transcripts | 43,752 |
Number of Protein-coding Genes | 36,648 |
Percentage of Eukaryote BUSCO Genes | 94.7 |
Percentage of Embroyphyte BUSCO Genes | 93.5 |
Assembly level | Chromosome |
The Solanum lycopersicum 'Heinz 1706 (cultivar)' SL5.0 Assembly file is available in FASTA format.
Downloads
Chromosomes (FASTA file) | Solanum_lycopersicum_Heinz1706_SL5.0.fasta.gz |
The Solanum lycopersicum 'Heinz 1706 (cultivar)' SL5.0 genome gene prediction files are available in GFF3 and FASTA format.
Downloads
Genes (GFF3 file) | Slycopersicum_796_ITAG5.0.gene.gff3.gz |
CDS sequences (FASTA file) | Slycopersicum_796_ITAG5.0.cds.fa.gz |
Protein sequences (FASTA file) | Slycopersicum_796_ITAG5.0.protein.fa.gz |
Functional annotation for the Solanum lycopersicum 'Heinz 1706 (cultivar)' SL5.0 is available for download below. The proteins were analyzed using InterProScan to assign InterPro domains(Pfam).
Downloads
Domain from InterProScan | Solanum_lycopersicum_Heinz1706_SL5.0.Pfam.tsv.gz |
Summary
Query | Chr | Size(bp) | Coordinates | BLASTn Hit | BLASTn %ID | Domain |
SLF15 | 1 | 93364382 | 2216983-2215724 | SL2.31ch01:2198500-2196501_SLF15 | 100 | F-box domain |
SLF16 | 1 | 93364382 | 2738960-2737779 | SL2.31ch01:2723400-2721301_SLF16 | 100 | F-box domain |
SLF17Ψ | 1 | 93364382 | 43356346-43355261 | SL2.31ch01:40853100-40851001_SLF17Ψ | 100 | - |
SLF1 | 1 | 93364382 | 46368379-46369548 | NM_001301439.2, SLF1 | 100 | F-box domain |
S-RNase | 1 | 93364382 | 47178015-47177776,47177678-47177253 | XM_004229015.1,Ribonuclease S-3 | 100 | Ribonuclease T2 family |
SLF2Ψ | 1 | 93364382 | 48064926-48063745 | KJ814870.1, SLF2 | 100 | - |
SLF12Ψ | 1 | 93364382 | 48121492-48122623 | SL2.31ch01:45516501-45518600_SLF12Ψ | 100 | - |
SLF4Ψ | 1 | 93364382 | 48188795-48187629 | KJ814943.1, SLF4 | 100 | - |
SLF5Ψ | 1 | 93364382 | 48269494-48268326 | KJ814872.1, SLF5 | 100 | - |
SLF6Ψ | 1 | 93364382 | 48287030-48285885 | KJ814944.1, SLF6 | 100 | - |
SLF8Ψ | 1 | 93364382 | 48844367-48843199 | SL2.31ch01:46243000-46240701_SLF8Ψ | 100 | - |
SLF7Ψ | 1 | 93364382 | 48869148-48868051 | SL2.31ch01:46267800-46265701_SLF7Ψ | 100 | - |
SLF9 | 1 | 93364382 | 51037284-51036220 | NM_001329461.2, SLF9 | 100 | F-box domain |
SLF10Ψ | 1 | 93364382 | 51482478-51483709 | KJ814899.1, SLF10 | 100 | - |
SLF11 | 1 | 93364382 | 53433949-53435121 | KJ814877.1, SLF11 | 100 | F-box associated |
SLF12 | 1 | 93364382 | 55193039-55191876 | NM_001301441.1, SLF12 | 100 | F-box associated |
SLF13 | 1 | 93364382 | 56038566-56037373 | NM_001301435.1, SLF13 | 100 | F-box associated |
SLF14Ψ | 1 | 93364382 | 59249426-59248256 | KJ814903.1, SLF14 | 100 | - |
SLF18 | 1 | 93364382 | 70407939-70409054 | SL2.31ch01:67739501-67741500_SLF18 | 100 | F-box domain |
SLF19 | 1 | 93364382 | 70426963-70425854 | SL2.31ch01:67757501-67759600_SLF19 | 100 | F-box domain |
Nucleotide
Protein