Triticum urartu G1812 Assembly & Annotation

Overview

Analysis Name Triticum urartu G1812 Assembly & Annotation
Sequencing technology Illumina HiSeq; PacBio
Assembly method MaSuRCA v. 3.2
Release Date 2018-04-30
Reference Publication(s)

Ling HQ, Ma B, Shi X, Liu H, Dong L, Sun H, Cao Y, Gao Q, Zheng S, Li Y, Yu Y, Du H, Qi M, Li Y, Lu H, Yu H, Cui Y, Wang N, Chen C, Wu H, Zhao Y, Zhang J, Li Y, Zhou W, Zhang B, Hu W, van Eijk MJT, Tang J, Witsenboer HMA, Zhao S, Li Z, Zhang A, Wang D, Liang C. Genome sequence of the progenitor of wheat A subgenome Triticum urartu. Nature. 2018 May;557(7705):424-428. doi: 10.1038/s41586-018-0108-0.

Abstract

Triticum urartu (diploid, AA) is the progenitor of the A subgenome of tetraploid (Triticum turgidum, AABB) and hexaploid (Triticum aestivum, AABBDD) wheat1,2. Genomic studies of T. urartu have been useful for investigating the structure, function and evolution of polyploid wheat genomes. Here we report the generation of a high-quality genome sequence of T. urartu by combining bacterial artificial chromosome (BAC)-by-BAC sequencing, single molecule real-time whole-genome shotgun sequencing3, linked reads and optical mapping4,5. We assembled seven chromosome-scale pseudomolecules and identified protein-coding genes, and we suggest a model for the evolution of T. urartu chromosomes. Comparative analyses with genomes of other grasses showed gene loss and amplification in the numbers of transposable elements in the T. urartu genome. Population genomics analysis of 147 T. urartu accessions from across the Fertile Crescent showed clustering of three groups, with differences in altitude and biostress, such as powdery mildew disease. The T. urartu genome assembly provides a valuable resource for studying genetic variation in wheat and related grasses, and promises to facilitate the discovery of genes that could be useful for wheat improvement.

Assembly statistics

Genome size4.8 Gb
Total ungapped length4.8 Gb
Number of chromosomes7
Number of scaffolds10,204
Scaffold N50661.5 Mb
Scaffold L504
Number of contigs44,013
Contig N50278.6 kb
Contig L505,048
GC percent46
Genome coverage335.0x
Assembly levelChromosome

Assembly

The Triticum urartu G1812 Assembly file is available in FASTA format.

Downloads

Chromosomes (FASTA file) GCF_003073215.2_Tu2.1_genomic.fna.gz

Gene Predictions

The Triticum urartu G1812 genome gene prediction files are available in GFF3 and FASTA format.

Downloads

Genes (GFF3 file) GCF_003073215.2_Tu2.1_genomic.gff.gz
CDS sequences (FASTA file) GCF_003073215.2_Tu2.1_cds_from_genomic.fna.gz
Protein sequences (FASTA file) GCF_003073215.2_Tu2.1_protein.faa.gz

Functional Analysis

Functional annotation for the Triticum urartu G1812 is available for download below. The proteins were analyzed using InterProScan to assign InterPro domains(Pfam).

Downloads

Domain from InterProScan Triticum_urartu.Pfam.tsv.gz

S genes

Summary

QueryChromosomeSize(bp)CoordinatestBLASTn HittBLASTn %IDDomain
DUF247I-S1NC_053022.158421191783573620-83575215LpSDUF247-I_chromosome179DUF247
DUF247I-S2NC_053026.1661480603627852843-627854438LpSDUF247-I_chromosome179DUF247
DUF247II-S1ΨNC_053022.158421191782296854-82297690LpSDUF247-II_chromosome175DUF247
DUF247II-S2ΨNC_053022.158421191782230122-82230958LpSDUF247-II_chromosome175DUF247
HPS10-S1NC_053022.158421191783572257-83572380,
83572486-83572619
LpsS_contig1102953-
HPS10-S2NC_053026.1661480603627855439-627855572,
627855660-627855801
LpsS_contig1102953-
DUF247I-ZΨNC_053023.1753719114716921513-716922049Dglomerata64DUF247
DUF247II-ZΨNC_053023.1753719114716925302-716926051LrDUF247II-Z62DUF247
HPS10-ZNC_053023.1753719114716923415-716923559,
716923683-716923792
LpsZ_chromosome235-

Nucleotide

Protein

© 2023 National Genomics Data Center, China National Center for Bioinformation / Beijing Institute of Genomics, Chinese Academy of Sciences