Analysis Name | Triticum urartu G1812 Assembly & Annotation |
Sequencing technology | Illumina HiSeq; PacBio |
Assembly method | MaSuRCA v. 3.2 |
Release Date | 2018-04-30 |
Ling HQ, Ma B, Shi X, Liu H, Dong L, Sun H, Cao Y, Gao Q, Zheng S, Li Y, Yu Y, Du H, Qi M, Li Y, Lu H, Yu H, Cui Y, Wang N, Chen C, Wu H, Zhao Y, Zhang J, Li Y, Zhou W, Zhang B, Hu W, van Eijk MJT, Tang J, Witsenboer HMA, Zhao S, Li Z, Zhang A, Wang D, Liang C. Genome sequence of the progenitor of wheat A subgenome Triticum urartu. Nature. 2018 May;557(7705):424-428. doi: 10.1038/s41586-018-0108-0.
AbstractTriticum urartu (diploid, AA) is the progenitor of the A subgenome of tetraploid (Triticum turgidum, AABB) and hexaploid (Triticum aestivum, AABBDD) wheat1,2. Genomic studies of T. urartu have been useful for investigating the structure, function and evolution of polyploid wheat genomes. Here we report the generation of a high-quality genome sequence of T. urartu by combining bacterial artificial chromosome (BAC)-by-BAC sequencing, single molecule real-time whole-genome shotgun sequencing3, linked reads and optical mapping4,5. We assembled seven chromosome-scale pseudomolecules and identified protein-coding genes, and we suggest a model for the evolution of T. urartu chromosomes. Comparative analyses with genomes of other grasses showed gene loss and amplification in the numbers of transposable elements in the T. urartu genome. Population genomics analysis of 147 T. urartu accessions from across the Fertile Crescent showed clustering of three groups, with differences in altitude and biostress, such as powdery mildew disease. The T. urartu genome assembly provides a valuable resource for studying genetic variation in wheat and related grasses, and promises to facilitate the discovery of genes that could be useful for wheat improvement.
Assembly statistics
Genome size | 4.8 Gb |
Total ungapped length | 4.8 Gb |
Number of chromosomes | 7 |
Number of scaffolds | 10,204 |
Scaffold N50 | 661.5 Mb |
Scaffold L50 | 4 |
Number of contigs | 44,013 |
Contig N50 | 278.6 kb |
Contig L50 | 5,048 |
GC percent | 46 |
Genome coverage | 335.0x |
Assembly level | Chromosome |
The Triticum urartu G1812 Assembly file is available in FASTA format.
Downloads
Chromosomes (FASTA file) | GCF_003073215.2_Tu2.1_genomic.fna.gz |
The Triticum urartu G1812 genome gene prediction files are available in GFF3 and FASTA format.
Downloads
Genes (GFF3 file) | GCF_003073215.2_Tu2.1_genomic.gff.gz |
CDS sequences (FASTA file) | GCF_003073215.2_Tu2.1_cds_from_genomic.fna.gz |
Protein sequences (FASTA file) | GCF_003073215.2_Tu2.1_protein.faa.gz |
Functional annotation for the Triticum urartu G1812 is available for download below. The proteins were analyzed using InterProScan to assign InterPro domains(Pfam).
Downloads
Domain from InterProScan | Triticum_urartu.Pfam.tsv.gz |
Summary
Query | Chromosome | Size(bp) | Coordinates | tBLASTn Hit | tBLASTn %ID | Domain |
DUF247I-S1 | NC_053022.1 | 584211917 | 83573620-83575215 | LpSDUF247-I_chromosome1 | 79 | DUF247 |
DUF247I-S2 | NC_053026.1 | 661480603 | 627852843-627854438 | LpSDUF247-I_chromosome1 | 79 | DUF247 |
DUF247II-S1Ψ | NC_053022.1 | 584211917 | 82296854-82297690 | LpSDUF247-II_chromosome1 | 75 | DUF247 |
DUF247II-S2Ψ | NC_053022.1 | 584211917 | 82230122-82230958 | LpSDUF247-II_chromosome1 | 75 | DUF247 |
HPS10-S1 | NC_053022.1 | 584211917 | 83572257-83572380,83572486-83572619 | LpsS_contig11029 | 53 | - |
HPS10-S2 | NC_053026.1 | 661480603 | 627855439-627855572,627855660-627855801 | LpsS_contig11029 | 53 | - |
DUF247I-ZΨ | NC_053023.1 | 753719114 | 716921513-716922049 | Dglomerata | 64 | DUF247 |
DUF247II-ZΨ | NC_053023.1 | 753719114 | 716925302-716926051 | LrDUF247II-Z | 62 | DUF247 |
HPS10-Z | NC_053023.1 | 753719114 | 716923415-716923559,716923683-716923792 | LpsZ_chromosome2 | 35 | - |
Nucleotide
Protein