Aegilops searsii Assembly & Annotation

Overview

Analysis Name Aegilops searsii Assembly & Annotation
Sequencing technology Oxford Nanopore
Assembly method wtdbg2 version 2
Release Date 2022-04-12
Reference Publication(s)

Li LF, Zhang ZB, Wang ZH, Li N, Sha Y, Wang XF, Ding N, Li Y, Zhao J, Wu Y, Gong L, Mafessoni F, Levy AA, Liu B. Genome sequences of five Sitopsis species of Aegilops and the origin of polyploid wheat B subgenome. Mol Plant. 2022 Mar 7;15(3):488-503. doi: 10.1016/j.molp.2021.12.019.

Abstract

Common wheat (Triticum aestivum, BBAADD) is a major staple food crop worldwide. The diploid progenitors of the A and D subgenomes have been unequivocally identified; that of B, however, remains ambiguous and controversial but is suspected to be related to species of Aegilops, section Sitopsis. Here, we report the assembly of chromosome-level genome sequences of all five Sitopsis species, namely Aegilops searsii, Ae. longissima, Ae. searsii, Ae. sharonensis, and Ae. speltoides, as well as the partial assembly of the Amblyopyrum muticum (synonym Aegilops mutica) genome for phylogenetic analysis. Our results reveal that the donor of the common wheat B subgenome is a distinct, and most probably extinct, diploid species that diverged from an ancestral progenitor of the B lineage to which the still extant Ae. speltoides and Am. muticum belong. In addition, we identified interspecific genetic introgressions throughout the evolution of the Triticum/Aegilops species complex. The five Sitopsis species have various assembled genome sizes (4.11–5.89 Gb) with high proportions of repetitive sequences (85.99%–89.81%); nonetheless, they retain high collinearity with other genomes or subgenomes of species in the Triticum/Aegilops complex. Differences in genome size were primarily due to independent post-speciation amplification of transposons. We also identified a set of Sitopsis genes pertinent to important agronomic traits that can be harnessed for wheat breeding. These newly assembled genome resources provide a new roadmap for evolutionary and genetic studies of the Triticum/Aegilops complex, as well as for wheat improvement.

Assembly statistics

Genome size (bp) 5,336,892,668
GC content 46.01%
Chromosomes sequence No. 7
Genome sequence No. 24,640
Maximum genome sequence length (bp) 750,148,016
Minimum genome sequence length (bp) 1,477
Average genome sequence length (bp) 216,595
Genome sequence N50 (bp) 646,289,097
Genome sequence N90 (bp) 94,610
Assembly level Chromosome

Assembly

The Aegilops searsii Assembly file is available in FASTA format.

Downloads

Chromosomes (FASTA file) GWHBFXU00000000.1.genome.fasta.gz

Gene Predictions

The Aegilops searsii genome gene prediction files are available in GFF3 and FASTA format.

Downloads

Genes (GFF3 file) GWHBFXU00000000.1.gff.gz
CDS sequences (FASTA file) GWHBFXU00000000.1.RNA.fasta.gz
Protein sequences (FASTA file) GWHBFXU00000000.1.Protein.faa.gz

Functional Analysis

Functional annotation for the Aegilops searsii is available for download below. The proteins were analyzed using InterProScan to assign InterPro domains(Pfam).

Downloads

Domain from InterProScan Aegilops_searsii.Pfam.tsv.gz

S genes

Summary

QueryChromosomeSize(bp)CoordinatestBLASTn HittBLASTn %IDDomain
DUF247II-SΨGWHBFXU00000001.1571561403107790188-107791063Aatlantica_DUF247II-SDUF247
HPS10-SGWHBFXU00000001.1571561403107457527-107457663,
107457737-107457815
LpsS_contig11029-
DUF247I-ZΨGWHBFXU00000002.1750148016698759165-698760373AlongiglumisDUF247I-ZDUF247
DUF247II-ZΨGWHBFXU00000002.1750148016698756533-698757161AlongiglumisDUF247II-ZDUF247
HPS10-ZGWHBFXU00000002.1750148016698758780-698758912,
698758960-698759006
AerianthaHPS10-Z-

Nucleotide

Protein

© 2023 National Genomics Data Center, China National Center for Bioinformation / Beijing Institute of Genomics, Chinese Academy of Sciences