Triticum aestivum IWGSC_CS_RefSeq_v2.1 Assembly & Annotation

Overview

Analysis Name Triticum aestivum IWGSC_CS_RefSeq_v2.1 Assembly & Annotation
Sequencing technology PacBio; Illumina HiSeq
Assembly method FALCON v. 0.2.2; DenovoMAGIC v. 2
Release Date 2021-05-06
Reference Publication(s)

Zhu T, Wang L, Rimbert H, Rodriguez JC, Deal KR, De Oliveira R, Choulet F, Keeble-Gagnère G, Tibbits J, Rogers J, Eversole K, Appels R, Gu YQ, Mascher M, Dvorak J, Luo MC. Optical maps refine the bread wheat Triticum aestivum cv. Chinese Spring genome assembly. Plant J. 2021 Jul;107(1):303-314. doi: 10.1111/tpj.15289.

Summary

Until recently, achieving a reference-quality genome sequence for bread wheat was long thought beyond the limits of genome sequencing and assembly technology, primarily due to the large genome size and > 80% repetitive sequence content. The release of the chromosome scale 14.5-Gb IWGSC RefSeq v1.0 genome sequence of bread wheat cv. Chinese Spring (CS) was, therefore, a milestone. Here, we used a direct label and stain (DLS) optical map of the CS genome together with a prior nick, label, repair and stain (NLRS) optical map, and sequence contigs assembled with Pacific Biosciences long reads, to refine the v1.0 assembly. Inconsistencies between the sequence and maps were reconciled and gaps were closed. Gap filling and anchoring of 279 unplaced scaffolds increased the total length of pseudomolecules by 168 Mb (excluding Ns). Positions and orientations were corrected for 233 and 354 scaffolds, respectively, representing 10% of the genome sequence. The accuracy of the remaining 90% of the assembly was validated. As a result of the increased contiguity, the numbers of transposable elements (TEs) and intact TEs have increased in IWGSC RefSeq v2.1 compared with v1.0. In total, 98% of the gene models identified in v1.0 were mapped onto this new assembly through development of a dedicated approach implemented in the MAGAAT pipeline. The numbers of high-confidence genes on pseudomolecules have increased from 105 319 to 105 534. The reconciled assembly enhances the utility of the sequence for genetic mapping, comparative genomics, gene annotation and isolation, and more general studies on the biology of wheat.

Assembly statistics

Genome size14.6 Gb
Total ungapped length14.3 Gb
Number of chromosomes21
Number of scaffolds91,588
Scaffold N50713.4 Mb
Scaffold L5010
Number of contigs306,348
Contig N50341.3 kb
Contig L5012,216
GC percent46
Genome coverage234.0x
Assembly levelChromosome

Assembly

The Triticum aestivum IWGSC_CS_RefSeq_v2.1 Assembly file is available in FASTA format.

Downloads

Chromosomes (FASTA file) GCF_018294505.1_IWGSC_CS_RefSeq_v2.1_genomic.fna.gz

Gene Predictions

The Triticum aestivum IWGSC_CS_RefSeq_v2.1 genome gene prediction files are available in GFF3 and FASTA format.

Downloads

Genes (GFF3 file) GCF_018294505.1_IWGSC_CS_RefSeq_v2.1_genomic.gff.gz
CDS sequences (FASTA file) GCF_018294505.1_IWGSC_CS_RefSeq_v2.1_cds_from_genomic.fna.gz
Protein sequences (FASTA file) GCF_018294505.1_IWGSC_CS_RefSeq_v2.1_protein.faa.gz

Functional Analysis

Functional annotation for the Triticum aestivum IWGSC_CS_RefSeq_v2.1 is available for download below. The proteins were analyzed using InterProScan to assign InterPro domains(Pfam).

Downloads

Domain from InterProScan Triticum_aestivum.Pfam.tsv.gz

S genes

Summary

QueryChromosomeSize(bp)CoordinatestBLASTn HittBLASTn %IDDomain
DUF247I-S1NC_057794.159866047184391168-84392763LpSDUF247-I_chromosome179DUF247
DUF247I-S2NC_057795.1700547350145709610-145711226LpSDUF247-I_chromosome178DUF247
DUF247I-S3ΨNC_057796.149863850989684709-89685896LpSDUF247-I_chromosome173DUF247
DUF247II-S1NC_057794.159866047183014913-83016532LpSDUF247-II_chromosome174DUF247
DUF247II-S2ΨNC_057795.1700547350145421961-145422617LrDUF247II-S173DUF247
DUF247II-S3ΨNC_057796.149863850989391973-89392746LpSDUF247-II_chromosome167DUF247
HPS10-S1NC_057794.159866047184386976-84387099,
84387205-84387338
LpsS_contig1294838-
HPS10-S2NC_057795.1700547350145429098-145429201,
145429309-145429435
LpsS_contig1102965-
HPS10-S3NC_057796.149863850989682681-89682768,
89682854-89682987
LpsS_contig1102957-
DUF247I-Z1ΨNC_057797.1787782082743917546-743918082Pinfirma82DUF247
DUF247I-Z2NC_057798.1812755788750154456-750156045LpZDUF247-I_chromosome260DUF247
DUF247I-Z3ΨNC_057799.1656544405611391777-611392115AsativaDUF247I-Z66DUF247
DUF247II-Z1ΨNC_057797.1787782082743921706-743922455LrDUF247II-Z62DUF247
DUF247II-Z2ΨNC_057798.1812755788750060149-750060499LrDUF247II-Z61DUF247
DUF247II-Z3ΨNC_057799.1656544405611382800-611382949AsativaDUF247II-Z178DUF247
HPS10-Z1NC_057797.1787782082743919453-743919597,
743919721-743919830
LpsZ_chromosome234-
HPS10-Z2NC_057798.1812755788750151591-750151723,
750151901-750152004
AerianthaHPS10-Z61-
HPS10-Z3NC_057799.1656544405611385099-611385211,
611385286-611385436
LpsZ_chromosome258-

Nucleotide

Protein

© 2023 National Genomics Data Center, China National Center for Bioinformation / Beijing Institute of Genomics, Chinese Academy of Sciences