Aegilops umbellulata TA1851 Assembly & Annotation

Overview

Analysis Name Aegilops umbellulata TA1851 Assembly & Annotation
Sequencing technology PacBio Sequel
Assembly method LJA v. 0.2; DeepConsensus v. 1.0.0
Release Date 2023-10-09
Reference Publication(s)

Abrouk M, Wang Y, Cavalet-Giorsa E, Troukhan M, Kravchuk M, Krattinger SG. Chromosome-scale assembly of the wild wheat relative Aegilops umbellulata. Sci Data. 2023 Oct 25;10(1):739. doi: 10.1038/s41597-023-02658-2.

Abstract

Wild wheat relatives have been explored in plant breeding to increase the genetic diversity of bread wheat, one of the most important food crops. Aegilops umbellulata is a diploid U genome-containing grass species that serves as a genetic reservoir for wheat improvement. In this study, we report the construction of a chromosome-scale reference assembly of Ae. umbellulata accession TA1851 based on corrected PacBio HiFi reads and chromosome conformation capture. The total assembly size was 4.25 Gb with a contig N50 of 17.7 Mb. In total, 36,268 gene models were predicted. We benchmarked the performance of hifiasm and LJA, two of the most widely used assemblers using standard and corrected HiFi reads, revealing a positive effect of corrected input reads. Comparative genome analysis confirmed substantial chromosome rearrangements in Ae. umbellulata compared to bread wheat. In summary, the Ae. umbellulata assembly provides a resource for comparative genomics in Triticeae and for the discovery of agriculturally important genes.

Assembly statistics

Genome size 4.2 Gb
Number of chromosomes 7
Number of scaffolds 7
Scaffold N50 626.8 Mb
Scaffold L50 4
Number of contigs 430
Contig N50 17.9 Mb
Contig L50 77
Assembly level Chromosome

Assembly

The Aegilops umbellulata TA1851 Assembly file is available in FASTA format.

Downloads

Chromosomes (FASTA file) AeUmbellulata_TA1851_v1.fasta.gz

Gene Predictions

The Aegilops umbellulata TA1851 genome gene prediction files are available in GFF3 and FASTA format.

Downloads

Genes (GFF3 file) AeUmbellulata_TA1851_v1.gff3.gz
CDS sequences (FASTA file) AeUmbellulata_TA1851_v1.cds.fasta.gz
Protein sequences (FASTA file) AeUmbellulata_TA1851_v1.prot.fasta.gz

Functional Analysis

Functional annotation for the Aegilops umbellulata TA1851 is available for download below. The proteins were analyzed using InterProScan to assign InterPro domains(Pfam).

Downloads

Domain from InterProScan Aegilops_umbellulata_TA1851.Pfam.tsv.gz

S genes

Summary

QueryChromosomeSize(bp)CoordinatestBLASTn HittBLASTn %IDDomain
DUF247I-Schr1U_TA185149442277095856844-95858439LpSDUF247-I_chromosome181DUF247
DUF247II-Schr1U_TA185149442277095679561-95681180LpSDUF247-II_chromosome175DUF247
HPS10-Schr1U_TA185149442277095855477-95855624,
95855704-95855837
LpsS_contig1294848-
HPS10-Zchr2U_TA1851646201372512154526-512154682,
512154763-512154890
Bromus_tectorum_HPS10-Z63-

Nucleotide

Protein

© 2023 National Genomics Data Center, China National Center for Bioinformation / Beijing Institute of Genomics, Chinese Academy of Sciences