Analysis Name | Aegilops umbellulata TA1851 Assembly & Annotation |
Sequencing technology | PacBio Sequel |
Assembly method | LJA v. 0.2; DeepConsensus v. 1.0.0 |
Release Date | 2023-10-09 |
Abrouk M, Wang Y, Cavalet-Giorsa E, Troukhan M, Kravchuk M, Krattinger SG. Chromosome-scale assembly of the wild wheat relative Aegilops umbellulata. Sci Data. 2023 Oct 25;10(1):739. doi: 10.1038/s41597-023-02658-2.
AbstractWild wheat relatives have been explored in plant breeding to increase the genetic diversity of bread wheat, one of the most important food crops. Aegilops umbellulata is a diploid U genome-containing grass species that serves as a genetic reservoir for wheat improvement. In this study, we report the construction of a chromosome-scale reference assembly of Ae. umbellulata accession TA1851 based on corrected PacBio HiFi reads and chromosome conformation capture. The total assembly size was 4.25 Gb with a contig N50 of 17.7 Mb. In total, 36,268 gene models were predicted. We benchmarked the performance of hifiasm and LJA, two of the most widely used assemblers using standard and corrected HiFi reads, revealing a positive effect of corrected input reads. Comparative genome analysis confirmed substantial chromosome rearrangements in Ae. umbellulata compared to bread wheat. In summary, the Ae. umbellulata assembly provides a resource for comparative genomics in Triticeae and for the discovery of agriculturally important genes.
Assembly statistics
Genome size | 4.2 Gb |
Number of chromosomes | 7 |
Number of scaffolds | 7 |
Scaffold N50 | 626.8 Mb |
Scaffold L50 | 4 |
Number of contigs | 430 |
Contig N50 | 17.9 Mb |
Contig L50 | 77 |
Assembly level | Chromosome |
The Aegilops umbellulata TA1851 Assembly file is available in FASTA format.
Downloads
Chromosomes (FASTA file) | AeUmbellulata_TA1851_v1.fasta.gz |
The Aegilops umbellulata TA1851 genome gene prediction files are available in GFF3 and FASTA format.
Downloads
Genes (GFF3 file) | AeUmbellulata_TA1851_v1.gff3.gz |
CDS sequences (FASTA file) | AeUmbellulata_TA1851_v1.cds.fasta.gz |
Protein sequences (FASTA file) | AeUmbellulata_TA1851_v1.prot.fasta.gz |
Functional annotation for the Aegilops umbellulata TA1851 is available for download below. The proteins were analyzed using InterProScan to assign InterPro domains(Pfam).
Downloads
Domain from InterProScan | Aegilops_umbellulata_TA1851.Pfam.tsv.gz |
Summary
Query | Chromosome | Size(bp) | Coordinates | tBLASTn Hit | tBLASTn %ID | Domain |
DUF247I-S | chr1U_TA1851 | 494422770 | 95856844-95858439 | LpSDUF247-I_chromosome1 | 81 | DUF247 |
DUF247II-S | chr1U_TA1851 | 494422770 | 95679561-95681180 | LpSDUF247-II_chromosome1 | 75 | DUF247 |
HPS10-S | chr1U_TA1851 | 494422770 | 95855477-95855624,95855704-95855837 | LpsS_contig12948 | 48 | - |
HPS10-Z | chr2U_TA1851 | 646201372 | 512154526-512154682,512154763-512154890 | Bromus_tectorum_HPS10-Z | 63 | - |
Nucleotide
Protein