Solanum lycopersicum gwh Tao_Lin Assembly & Annotation

Overview

Analysis Name Solanum lycopersicum gwh Tao_Lin Assembly & Annotation
Sequencing technology PacBio
Assembly method CANU version1.3
Release Date 2022-01-06
Reference Publication(s)

Su X, Wang B, Geng X, Du Y, Yang Q, Liang B, Meng G, Gao Q, Yang W, Zhu Y, Lin T. A high-continuity and annotated tomato reference genome. BMC Genomics. 2021 Dec 15;22(1):898. doi: 10.1186/s12864-021-08212-xdoi: 10.1186/s12864-021-08212-x.

Abstract

Background
Genetic and functional genomics studies require a high-quality genome assembly. Tomato (Solanum lycopersicum), an important horticultural crop, is an ideal model species for the study of fruit development.
Results
Here, we assembled an updated reference genome of S. lycopersicum cv. Heinz 1706 that was 799.09 Mb in length, containing 34,384 predicted protein-coding genes and 65.66% repetitive sequences. By comparing the genomes of S. lycopersicum and S. pimpinellifolium LA2093, we found a large number of genomic fragments probably associated with human selection, which may have had crucial roles in the domestication of tomato. We also used a recombinant inbred line (RIL) population to generate a high-density genetic map with high resolution and accuracy. Using these resources, we identified a number of candidate genes that were likely to be related to important agronomic traits in tomato.
Conclusion
Our results offer opportunities for understanding the evolution of the tomato genome and will facilitate the study of genetic mechanisms in tomato biology.

Assembly statistics

Genome size (bp) 799,091,949
Chromosomes sequence No. 12
Genome sequence No. 83
Maximum genome sequence length (bp) 91,866,112
Minimum genome sequence length (bp) 16,018
Average genome sequence length (bp) 9,627,614
Genome sequence N50 (bp) 66,166,780
Genome sequence N90 (bp) 54,674,139
Assembly level Chromosome

Assembly

The Solanum lycopersicum gwh Tao_Lin Assembly file is available in FASTA format.

Downloads

Chromosomes (FASTA file) GWHBAUD00000000.genome.fasta.gz

Gene Predictions

The Solanum lycopersicum gwh Tao_Lin genome gene prediction files are available in GFF3 and FASTA format.

Downloads

Genes (GFF3 file) GWHBAUD00000000.gff.gz
CDS sequences (FASTA file) GWHBAUD00000000.CDS.fasta.gz
Protein sequences (FASTA file) GWHBAUD00000000.Protein.faa.gz

Functional Analysis

Functional annotation for the Solanum lycopersicum gwh Tao_Lin is available for download below. The proteins were analyzed using InterProScan to assign InterPro domains(Pfam).

Downloads

Domain from InterProScan Solanum_lycopersicum_gwh_Tao_Lin.Pfam.tsv.gz

S genes

Summary

QueryChrSize(bp)CoordinatesBLASTn HitBLASTn %IDDomain
SLF15GWHBAUD
00000001
918661122144101-2142842SL2.31ch01:2198500-2196501_SLF15100F-box domain
SLF16GWHBAUD
00000001
918661122666068-2664887SL2.31ch01:2723400-2721301_SLF16100F-box domain
SLF17ΨGWHBAUD
00000001
9186611241862564-41861479SL2.31ch01:40853100-40851001_SLF17Ψ100-
SLF1GWHBAUD
00000001
9186611244862998-44864167NM_001301439.2, SLF1100F-box domain
S-RNaseGWHBAUD
00000001
9186611245672592-45672353
45672255-45671830
XM_004229015.1,
Ribonuclease S-3
100Ribonuclease T2 family
SLF2ΨGWHBAUD
00000001
9186611246559497-46558316KJ814870.1, SLF2100-
SLF12ΨGWHBAUD
00000001
9186611246616065-46617196SL2.31ch01:45516501-45518600_SLF12Ψ100-
SLF4ΨGWHBAUD
00000001
9186611246683367-46682201KJ814943.1, SLF4100-
SLF5ΨGWHBAUD
00000001
9186611246764066-46762898KJ814872.1, SLF5100-
SLF6ΨGWHBAUD
00000001
9186611246781602-46780457KJ814944.1, SLF6100-
SLF8ΨGWHBAUD
00000001
9186611247338932-47337764SL2.31ch01:46243000-46240701_SLF8Ψ100-
SLF7ΨGWHBAUD
00000001
9186611247363711-47362614SL2.31ch01:46267800-46265701_SLF7Ψ100-
SLF9GWHBAUD
00000001
9186611249531820-49530756NM_001329461.2, SLF9100F-box domain
SLF10ΨGWHBAUD
00000001
9186611249977014-49978245KJ814899.1, SLF10100-
SLF11GWHBAUD
00000001
9186611251928457-51929629KJ814877.1, SLF11100F-box associated
SLF12GWHBAUD
00000001
9186611253687508-53686345NM_001301441.1, SLF12100F-box associated
SLF13GWHBAUD
00000001
9186611254533010-54531817NM_001301435.1, SLF13100F-box associated
SLF14ΨGWHBAUD
00000001
9186611257743756-57742586KJ814903.1, SLF14100-
SLF18GWHBAUD
00000001
9186611268901807-68902922SL2.31ch01:67739501-67741500_SLF18100F-box domain
SLF19GWHBAUD
00000001
9186611268920831-68919722SL2.31ch01:67757501-67759600_SLF19100F-box domain

Nucleotide

Protein

© 2023 National Genomics Data Center, China National Center for Bioinformation / Beijing Institute of Genomics, Chinese Academy of Sciences