Solanum aethiopicum Assembly & Annotation

Overview

Analysis Name Solanum aethiopicum Assembly & Annotation
Sequencing technology Illumina
Assembly method SOAPdenovo2
Release Date 2019-10-01
Reference Publication(s)

Song B, Song Y, Fu Y, Kizito EB, Kamenya SN, Kabod PN, Liu H, Muthemba S, Kariba R, Njuguna J, Maina S, Stomeo F, Djikeng A, Hendre PS, Chen X, Chen W, Li X, Sun W, Wang S, Cheng S, Muchugi A, Jamnadass R, Shapiro HY, Van Deynze A, Yang H, Wang J, Xu X, Odeny DA, Liu X. Draft genome sequence of Solanum aethiopicum provides insights into disease resistance, drought tolerance, and the evolution of the genome. Gigascience. 2019 Oct 1;8(10):giz115. doi: 10.1093/gigascience/giz115.

Abstract

Background: The African eggplant (Solanum aethiopicum) is a nutritious traditional vegetable used in many African countries, including Uganda and Nigeria. It is thought to have been domesticated in Africa from its wild relative, Solanum anguivi. S.aethiopicum has been routinely used as a source of disease resistance genes for several Solanaceae crops, including Solanum melongena. A lack of genomic resources has meant that breeding of S. aethiopicum has lagged behind other vegetable crops.

Results: We assembled a 1.02-Gb draft genome of S. aethiopicum, which contained predominantly repetitive sequences (78.9%). We annotated 37,681 gene models, including 34,906 protein-coding genes. Expansion of disease resistance genes was observed via 2 rounds of amplification of long terminal repeat retrotransposons, which may have occurred ∼1.25 and 3.5 million years ago, respectively. By resequencing 65 S. aethiopicum and S. anguivi genotypes, 18,614,838 single-nucleotide polymorphisms were identified, of which 34,171 were located within disease resistance genes. Analysis of domestication and demographic history revealed active selection for genes involved in drought tolerance in both “Gilo” and “Shum” groups. A pan-genome of S. aethiopicum was assembled, containing 51,351 protein-coding genes; 7,069 of these genes were missing from the reference genome.

Conclusions: The genome sequence of S. aethiopicum enhances our understanding of its biotic and abiotic resistance. The single-nucleotide polymorphisms identified are immediately available for use by breeders. The information provided here will accelerate selection and breeding of the African eggplant, as well as other crops within the Solanaceae family.

Assembly statistics

Scaffold number 162,187
Scaffold total length 1.02 Gb
Scaffold N50 516.1 kb
Scaffold longest 2.94 Mb
Contig number 231,821
Contig total length 936 Mb
Contig N50 25.2 kb
Contig longest 366.2 kb
GC content 33.13%
Number of genes 34,906
Total length of transposable elements 805.7 Mb (78.23%)
Assembly level Scaffold

Assembly

The Solanum aethiopicum Assembly file is available in FASTA format.

Downloads

Chromosomes (FASTA file) Solanum_aethiopicum.genome.fa.gz

Gene Predictions

The Solanum aethiopicum genome gene prediction files are available in GFF3 and FASTA format.

Downloads

Genes (GFF3 file) Solanum_aethiopicum.gene.gff.gz
CDS sequences (FASTA file) Solanum_aethiopicum.gene.cds.fa.gz
Protein sequences (FASTA file) Solanum_aethiopicum.gene.pep.fa.gz

Functional Analysis

Functional annotation for the Solanum aethiopicum is available for download below. The proteins were analyzed using InterProScan to assign InterPro domains(Pfam).

Downloads

Domain from InterProScan Solanum_aethiopicum.Pfam.tsv.gz

S genes

Summary

QueryScaffoldSize(bp)CoordinatesBLASTn HitBLASTn %IDDomain
SLF18scaffold141504_cov6116372871394419-1395519Solanum tuberosum DM8.1 SLF1886.6F-box domain
SLF19scaffold149241_cov621425273999715-1000830Solanum tuberosum DM8.1 SLF1989.9F-box domain
SLF15scaffold149356_cov632300088790925-792190Solanum tuberosum DM8.1 SLF1584.5F-box domain
SLF16scaffold149356_cov6323000882077817-2076645Solanum tuberosum DM8.1 SLF1687.5F-box domain
SLF23ψscaffold150928_cov632308319914-8757Solanum lycopersicoides KU960925.1, SLF2384.4-
SLF11ψscaffold150938_cov63702656443047-444222Solanum tuberosum DM8.1 SLF1187.4-

Nucleotide

Protein

© 2023 National Genomics Data Center, China National Center for Bioinformation / Beijing Institute of Genomics, Chinese Academy of Sciences